[jira] [Commented] (YARN-10688) ClusterMetrics should support GPU capacity related metrics.

2021-03-16 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303115#comment-17303115
 ] 

Hadoop QA commented on YARN-10688:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
37s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 1 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
42s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m  8s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
41s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 20m 
24s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  1m 
58s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
53s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
59s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
46s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 39s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/806/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color}
 | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 15 unchanged - 0 fixed = 16 total (was 15) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 57s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} 

[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.

2021-03-16 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303097#comment-17303097
 ] 

Hadoop QA commented on YARN-10674:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
39s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 2 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
56s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 10s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
41s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 21m 
21s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  1m 
54s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
51s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
57s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
46s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green}{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 0 new + 13 unchanged - 7 fixed = 13 total (was 20) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
50s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m  2s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| 

[jira] [Updated] (YARN-9618) NodeListManager event improvement

2021-03-16 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-9618:
-
Attachment: YARN-9618.004.patch

> NodeListManager event improvement
> -
>
> Key: YARN-9618
> URL: https://issues.apache.org/jira/browse/YARN-9618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-9618.001.patch, YARN-9618.002.patch, 
> YARN-9618.003.patch, YARN-9618.004.patch
>
>
> Current implementation nodelistmanager event blocks async dispacher and can 
> cause RM crash and slowing down event processing.
> # Cluster restart with 1K running apps . Each usable event will create 1K 
> events over all events could be 5k*1k events for 5K cluster
> # Event processing is blocked till new events are added to queue.
> Solution :
> # Add another async Event handler similar to scheduler.
> # Instead of adding events to dispatcher directly call RMApp event handler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10616) Nodemanagers cannot detect GPU failures

2021-03-16 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303044#comment-17303044
 ] 

Qi Zhu commented on YARN-10616:
---

[~ebadger] [~ztang]

Actually we can use the graceful decommission way to realize that:

"We will use {{updateNodeResource}} to set the node resources to 0, meaning 
that nothing will get scheduled on the node. But the NM will still be running 
so that we can jstack or grab a heap dump."

I think we can realize the NM-RM heartbeat approach  first, then to handle the 
updateNodeResource.

What you advice about this?

> Nodemanagers cannot detect GPU failures
> ---
>
> Key: YARN-10616
> URL: https://issues.apache.org/jira/browse/YARN-10616
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>
> As stated above, the bug is that GPUs can fail, but the NM doesn't notice the 
> failure. The NM will continue to schedule tasks onto the failed GPU, but the 
> GPU won't actually work and so the container will likely fail or run very 
> slowly on the CPU. 
> My initial thought on solving this is to add NM resource capabilities to the 
> NM-RM heartbeat and have the RM update its view of the NM's resource 
> capabilities on each heartbeat. This would be a fairly trivial change, but 
> comes with the unfortunate side effect that it completely undermindes {{yarn 
> rmadmin -updateNodeResource}}. When you run {{-updateNodeResource}} the 
> assumption is that the node will retain these new resource capabilities until 
> either the NM or RM is restarted. But with a heartbeat interaction constantly 
> updating those resource capabilities from the NM perspective, the explicit 
> changes via {{-updateNodeResource}} would be lost on the next heartbeat. We 
> could potentially add a flag to ignore the heartbeat updates for any node who 
> has had {{-updateNodeResource}} called on it (until a re-registration). But 
> in this case, the node would no longer get resource capability updates until 
> the NM or RM restarted. If {{-updateNodeResource}} is used a decent amount, 
> then that would give potentially unexpected behavior in relation to nodes 
> properly auto-detecting failures.
> Another idea is to add a GPU monitor thread on the NM to periodically run 
> {{nvidia-smi}} and detect changes in the number of healthy GPUs. If that 
> number decreased, the node would hook into the health check status and mark 
> itself as unhealthy. The downside of this approach is that a single failed 
> GPU would mean taking out an entire node (e.g. 8 GPUs).
> I would really like to go with the NM-RM heartbeat approach, but the 
> {{-updateNodeResource}} issue bothers me. The second approach is ok I guess, 
> but I also don't like taking down whole GPU nodes when only a single GPU is 
> bad. Would like to hear thoughts of others on how best to approach this
> [~jhung], [~leftnoteasy], [~sunilg], [~epayne], [~Jim_Brennan]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10688) ClusterMetrics should support GPU capacity related metrics.

2021-03-16 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303039#comment-17303039
 ] 

Qi Zhu commented on YARN-10688:
---

Thanks [~ebadger] for confirm.

I also think it is more reasonable to remove private.

Updated in latest patch.

 

> ClusterMetrics should support GPU capacity related metrics.
> ---
>
> Key: YARN-10688
> URL: https://issues.apache.org/jira/browse/YARN-10688
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: metrics, resourcemanager
>Affects Versions: 3.2.2, 3.4.0
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10688.001.patch, YARN-10688.002.patch, 
> YARN-10688.003.patch, YARN-10688.004.patch, image-2021-03-11-15-35-49-625.png
>
>
> Now the ClusterMetrics only support memory and Vcore related metrics.
>  
> {code:java}
> @Metric("Memory Utilization") MutableGaugeLong utilizedMB;
> @Metric("Vcore Utilization") MutableGaugeLong utilizedVirtualCores;
> @Metric("Memory Capability") MutableGaugeLong capabilityMB;
> @Metric("Vcore Capability") MutableGaugeLong capabilityVirtualCores;
> {code}
>  
>  
> !image-2021-03-11-15-35-49-625.png|width=593,height=253!
> In our cluster, we added GPU supported, so i think the GPU related metrics 
> should also be supported by ClusterMetrics.
>  
> cc [~pbacsko]  [~Jim_Brennan]  [~ebadger]  [~gandras]  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10688) ClusterMetrics should support GPU capacity related metrics.

2021-03-16 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10688:
--
Attachment: YARN-10688.004.patch

> ClusterMetrics should support GPU capacity related metrics.
> ---
>
> Key: YARN-10688
> URL: https://issues.apache.org/jira/browse/YARN-10688
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: metrics, resourcemanager
>Affects Versions: 3.2.2, 3.4.0
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10688.001.patch, YARN-10688.002.patch, 
> YARN-10688.003.patch, YARN-10688.004.patch, image-2021-03-11-15-35-49-625.png
>
>
> Now the ClusterMetrics only support memory and Vcore related metrics.
>  
> {code:java}
> @Metric("Memory Utilization") MutableGaugeLong utilizedMB;
> @Metric("Vcore Utilization") MutableGaugeLong utilizedVirtualCores;
> @Metric("Memory Capability") MutableGaugeLong capabilityMB;
> @Metric("Vcore Capability") MutableGaugeLong capabilityVirtualCores;
> {code}
>  
>  
> !image-2021-03-11-15-35-49-625.png|width=593,height=253!
> In our cluster, we added GPU supported, so i think the GPU related metrics 
> should also be supported by ClusterMetrics.
>  
> cc [~pbacsko]  [~Jim_Brennan]  [~ebadger]  [~gandras]  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.

2021-03-16 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303024#comment-17303024
 ] 

Qi Zhu commented on YARN-10674:
---

[~pbacsko]

Fixed the checkstyle in latest patch.:D

> fs2cs: should support auto created queue deletion.
> --
>
> Key: YARN-10674
> URL: https://issues.apache.org/jira/browse/YARN-10674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10674.001.patch, YARN-10674.002.patch, 
> YARN-10674.003.patch, YARN-10674.004.patch, YARN-10674.005.patch, 
> YARN-10674.006.patch, YARN-10674.007.patch, YARN-10674.008.patch, 
> YARN-10674.009.patch, YARN-10674.010.patch, YARN-10674.011.patch, 
> YARN-10674.012.patch
>
>
> In FS the auto deletion check interval is 10s.
> {code:java}
> @Override
> public void onCheck() {
>   queueMgr.removeEmptyDynamicQueues();
>   queueMgr.removePendingIncompatibleQueues();
> }
> while (running) {
>   try {
> synchronized (this) {
>   reloadListener.onCheck();
> }
> ...
> Thread.sleep(reloadIntervalMs);
> }
> /** Time to wait between checks of the allocation file */
> public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10674) fs2cs: should support auto created queue deletion.

2021-03-16 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10674:
--
Attachment: YARN-10674.012.patch

> fs2cs: should support auto created queue deletion.
> --
>
> Key: YARN-10674
> URL: https://issues.apache.org/jira/browse/YARN-10674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10674.001.patch, YARN-10674.002.patch, 
> YARN-10674.003.patch, YARN-10674.004.patch, YARN-10674.005.patch, 
> YARN-10674.006.patch, YARN-10674.007.patch, YARN-10674.008.patch, 
> YARN-10674.009.patch, YARN-10674.010.patch, YARN-10674.011.patch, 
> YARN-10674.012.patch
>
>
> In FS the auto deletion check interval is 10s.
> {code:java}
> @Override
> public void onCheck() {
>   queueMgr.removeEmptyDynamicQueues();
>   queueMgr.removePendingIncompatibleQueues();
> }
> while (running) {
>   try {
> synchronized (this) {
>   reloadListener.onCheck();
> }
> ...
> Thread.sleep(reloadIntervalMs);
> }
> /** Time to wait between checks of the allocation file */
> public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10699) Document the fact that usage of usernames/groupnames with a "." (dot) is strictly not recommended

2021-03-16 Thread Siddharth Ahuja (Jira)
Siddharth Ahuja created YARN-10699:
--

 Summary: Document the fact that usage of usernames/groupnames with 
a "." (dot) is strictly not recommended
 Key: YARN-10699
 URL: https://issues.apache.org/jira/browse/YARN-10699
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: docs, documentation
Reporter: Siddharth Ahuja


Based on discussions in YARN-10652, it is clear that usage of a "." (dot) in a 
username/groupname (e.g. users in AD/LDAP) can cause unexpected issues e.g. 
[placement rules involving username (%user placeholder) will definitely exhibit 
unexpected 
behavior|https://issues.apache.org/jira/browse/YARN-10652?focusedCommentId=17295964=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17295964].

As such, we need to document clearly for our customers that this use-case is 
strictly not recommended in CapacityScheduler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10699) Document the fact that usage of usernames/groupnames with a "." (dot) is strictly not recommended

2021-03-16 Thread Siddharth Ahuja (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Ahuja reassigned YARN-10699:
--

Assignee: Siddharth Ahuja

> Document the fact that usage of usernames/groupnames with a "." (dot) is 
> strictly not recommended
> -
>
> Key: YARN-10699
> URL: https://issues.apache.org/jira/browse/YARN-10699
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: docs, documentation
>Reporter: Siddharth Ahuja
>Assignee: Siddharth Ahuja
>Priority: Major
>
> Based on discussions in YARN-10652, it is clear that usage of a "." (dot) in 
> a username/groupname (e.g. users in AD/LDAP) can cause unexpected issues e.g. 
> [placement rules involving username (%user placeholder) will definitely 
> exhibit unexpected 
> behavior|https://issues.apache.org/jira/browse/YARN-10652?focusedCommentId=17295964=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17295964].
> As such, we need to document clearly for our customers that this use-case is 
> strictly not recommended in CapacityScheduler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10652) Capacity Scheduler fails to handle user weights for a user that has a "." (dot) in it

2021-03-16 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302978#comment-17302978
 ] 

Wilfred Spiegelenburg commented on YARN-10652:
--

Thank you to [~sahuja] for the fix, and to all ([~snemeth] , [~shuzirra] , 
[~gandras] & [~pbacsko]) for the discussion and resolution around this jira.

I committed to trunk with a comment in the commit message:
{quote}This only fixes the user name resolution for weights in the queues. It 
does not add generic support for user names with dots in all use cases in the 
capacity scheduler.
{quote}

> Capacity Scheduler fails to handle user weights for a user that has a "." 
> (dot) in it
> -
>
> Key: YARN-10652
> URL: https://issues.apache.org/jira/browse/YARN-10652
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Siddharth Ahuja
>Assignee: Siddharth Ahuja
>Priority: Major
> Attachments: Correct user weight of 0.76 picked up for the user with 
> a dot after the patch.png, Incorrect default user weight of 1.0 being picked 
> for the user with a dot before the patch.png, YARN-10652.001.patch
>
>
> AD usernames can have a "." (dot) in them i.e. they can be of the format -> 
> {{firstname.lastname}}. However, if you specify a username with this format 
> against the Capacity Scheduler setting -> 
> {{yarn.scheduler.capacity.root.default.user-settings.firstname.lastname.weight}},
>  it fails to be applied and is instead assigned the default of 1.0f weight. 
> This renders the user weight feature (being used as a means of setting user 
> priorities for a queue) unusable for such users.
> This limitation comes from [1]. From [1], only word characters (A word 
> character: [a-zA-Z_0-9]) (see [2]) are permissible at the moment which is no 
> good for AD names that contain a "." (dot).
> Similar discussion has been had in a few HADOOP jiras e.g. HADOOP-7050 and 
> HADOOP-15395 and the outcome was to use non-whitespace characters i.e. 
> instead of {{\w+}}, use {{\S+}}.
> We could go down similar path and unblock this feature for the AD usernames 
> with a "." (dot) in them.
> [1] 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java#L1953
> [2] 
> https://docs.oracle.com/javase/tutorial/essential/regex/pre_char_classes.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10503) Support queue capacity in terms of absolute resources with gpu resourceType.

2021-03-16 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302956#comment-17302956
 ] 

Eric Badger commented on YARN-10503:


One initial question I have is whether we should generalize this to any 
resource type (e.g. GPU, FPGA, etc). GPU already isn't a first-class resource 
in YARN. If we aren't going to make it one, then I think it would be prudent to 
make these additions generalized to all arbitrary resources instead of just GPUs

> Support queue capacity in terms of absolute resources with gpu resourceType.
> 
>
> Key: YARN-10503
> URL: https://issues.apache.org/jira/browse/YARN-10503
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-10503.001.patch, YARN-10503.002.patch
>
>
> Now the absolute resources are memory and cores.
> {code:java}
> /**
>  * Different resource types supported.
>  */
> public enum AbsoluteResourceType {
>   MEMORY, VCORES;
> }{code}
> But in our GPU production clusters, we need to support more resourceTypes.
> It's very import for cluster scaling when with different resourceType 
> absolute demands.
>  
> This Jira will handle GPU first.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10692) Add Node GPU Utilization and apply to NodeMetrics.

2021-03-16 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302931#comment-17302931
 ] 

Eric Badger commented on YARN-10692:


[~zhuqi], it looks like the unit test failure from Hadoop QA is related to the 
patch. Additionally, there are no unit tests added for the patch. I think it 
would be good to add to TestNodeManagerMetrics

> Add Node GPU Utilization and apply to NodeMetrics.
> --
>
> Key: YARN-10692
> URL: https://issues.apache.org/jira/browse/YARN-10692
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10692.001.patch
>
>
> Now there are no node level GPU Utilization, this issue will add it, and add 
> it to NodeMetrics first.
> cc [~pbacsko]  [~Jim_Brennan]  [~ebadger]  [~gandras]  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10688) ClusterMetrics should support GPU capacity related metrics.

2021-03-16 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302893#comment-17302893
 ] 

Eric Badger commented on YARN-10688:


{noformat}
   @Metric("Vcore Utilization") MutableGaugeLong utilizedVirtualCores;
   @Metric("Memory Capability") MutableGaugeLong capabilityMB;
   @Metric("Vcore Capability") MutableGaugeLong capabilityVirtualCores;
+  @Metric("GPU Capability")
+  private MutableGaugeLong capabilityGPUs;
{noformat}
To maintain consistency, I would actually remove the private here and let the 
checkstyle warning exist. I would prefer to update the checkstyle for them all 
in a separate JIRA. But I think consistency is most important. Other than that, 
the patch looks good to me

> ClusterMetrics should support GPU capacity related metrics.
> ---
>
> Key: YARN-10688
> URL: https://issues.apache.org/jira/browse/YARN-10688
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: metrics, resourcemanager
>Affects Versions: 3.2.2, 3.4.0
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10688.001.patch, YARN-10688.002.patch, 
> YARN-10688.003.patch, image-2021-03-11-15-35-49-625.png
>
>
> Now the ClusterMetrics only support memory and Vcore related metrics.
>  
> {code:java}
> @Metric("Memory Utilization") MutableGaugeLong utilizedMB;
> @Metric("Vcore Utilization") MutableGaugeLong utilizedVirtualCores;
> @Metric("Memory Capability") MutableGaugeLong capabilityMB;
> @Metric("Vcore Capability") MutableGaugeLong capabilityVirtualCores;
> {code}
>  
>  
> !image-2021-03-11-15-35-49-625.png|width=593,height=253!
> In our cluster, we added GPU supported, so i think the GPU related metrics 
> should also be supported by ClusterMetrics.
>  
> cc [~pbacsko]  [~Jim_Brennan]  [~ebadger]  [~gandras]  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10370) [Umbrella] Reduce the feature gap between FS Placement Rules and CS Queue Mapping rules

2021-03-16 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302878#comment-17302878
 ] 

Peter Bacsko edited comment on YARN-10370 at 3/16/21, 8:36 PM:
---

[~shuzirra] [~snemeth] the vast majority of tasks in this JIRA are done. There 
are some open tasks left.

I think it's safe to say that this feature is ready and the remaining tasks can 
be completed either as standalone tasks or under a "Part II" JIRA. Otherwise we 
might need to keep this open for a long time.

IMO we should move the open / patch available tasks under a new umbrella and 
resolve this, marked with a proper Fix version.

Opinions?


was (Author: pbacsko):
[~shuzirra] [~snemeth] the vast majority of tasks in this JIRA are done. There 
are some open tasks left.

I think it's safe to say that the umbrella is done and the remaining tasks can 
be completed either as standalone tasks or under a "Part II" JIRA. Otherwise we 
might need to keep this open for a long time.

IMO we should move the open / patch available tasks under a new umbrella and 
resolve this, marked with a proper Fix version.

Opinions?

> [Umbrella] Reduce the feature gap between FS Placement Rules and CS Queue 
> Mapping rules
> ---
>
> Key: YARN-10370
> URL: https://issues.apache.org/jira/browse/YARN-10370
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
>  Labels: capacity-scheduler, capacityscheduler
> Attachments: MappingRuleEnhancements.pdf, Possible extensions of 
> mapping rule format in Capacity Scheduler.pdf
>
>
> To continue closing the feature gaps between Fair Scheduler and Capacity 
> Scheduler to help users migrate between the scheduler more easy, we need to 
> add some of the Fair Scheduler placement rules to the capacity scheduler's 
> queue mapping functionality.
> With [~snemeth] and [~pbacsko] we've created the following design docs about 
> the proposed changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10370) [Umbrella] Reduce the feature gap between FS Placement Rules and CS Queue Mapping rules

2021-03-16 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302878#comment-17302878
 ] 

Peter Bacsko commented on YARN-10370:
-

[~shuzirra] [~snemeth] the vast majority of tasks in this JIRA are done. There 
are some open tasks left.

I think it's safe to say that the umbrella is done and the remaining tasks can 
be completed either as standalone tasks or under a "Part II" JIRA. Otherwise we 
might need to keep this open for a long time.

IMO we should move the open / patch available tasks under a new umbrella and 
resolve this, marked with a proper Fix version.

Opinions?

> [Umbrella] Reduce the feature gap between FS Placement Rules and CS Queue 
> Mapping rules
> ---
>
> Key: YARN-10370
> URL: https://issues.apache.org/jira/browse/YARN-10370
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
>  Labels: capacity-scheduler, capacityscheduler
> Attachments: MappingRuleEnhancements.pdf, Possible extensions of 
> mapping rule format in Capacity Scheduler.pdf
>
>
> To continue closing the feature gaps between Fair Scheduler and Capacity 
> Scheduler to help users migrate between the scheduler more easy, we need to 
> add some of the Fair Scheduler placement rules to the capacity scheduler's 
> queue mapping functionality.
> With [~snemeth] and [~pbacsko] we've created the following design docs about 
> the proposed changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10493) RunC container repository v2

2021-03-16 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit reassigned YARN-10493:
---

Assignee: Matt Sharp  (was: Craig Condit)

> RunC container repository v2
> 
>
> Key: YARN-10493
> URL: https://issues.apache.org/jira/browse/YARN-10493
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, yarn
>Affects Versions: 3.3.0
>Reporter: Craig Condit
>Assignee: Matt Sharp
>Priority: Major
> Attachments: runc-container-repository-v2-design.pdf
>
>
> The current runc container repository design has scalability and usability 
> issues which will likely limit widespread adoption. We should address this 
> with a new, V2 layout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10494) CLI tool for docker-to-squashfs conversion (pure Java)

2021-03-16 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit reassigned YARN-10494:
---

Assignee: Matt Sharp  (was: Craig Condit)

> CLI tool for docker-to-squashfs conversion (pure Java)
> --
>
> Key: YARN-10494
> URL: https://issues.apache.org/jira/browse/YARN-10494
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.3.0
>Reporter: Craig Condit
>Assignee: Matt Sharp
>Priority: Major
>  Labels: pull-request-available
> Attachments: YARN-10494.001.patch, 
> docker-to-squashfs-conversion-tool-design.pdf
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> *YARN-9564* defines a docker-to-squashfs image conversion tool that relies on 
> python2, multiple libraries, squashfs-tools and root access in order to 
> convert Docker images to squashfs images for use with the runc container 
> runtime in YARN.
> *YARN-9943* was created to investigate alternatives, as the response to 
> merging YARN-9564 has not been very positive. This proposal outlines the 
> design for a CLI conversion tool in 100% pure Java that will work out of the 
> box.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10616) Nodemanagers cannot detect GPU failures

2021-03-16 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302864#comment-17302864
 ] 

Eric Badger commented on YARN-10616:


bq. For the "updateNodeResource" issue, one question is that is it a frequently 
used operation? I'm not ware of the scenario that we use this often.
[~ztang], we use this feature internally. Maybe once or twice a day across all 
of our clusters. Usually to quickly remove a node from a cluster while we 
investigate why it's running slow or causing errors. We will use 
{{updateNodeResource}} to set the node resources to 0, meaning that nothing 
will get scheduled on the node. But the NM will still be running so that we can 
jstack or grab a heap dump. For us at least, the only time we ever use this 
operation is to remove a node from the cluster. So maybe there's a different 
way that we could do that such that it doesn't mess with the node resources. 
Because this really is just a simple hack to get the node to node schedule 
anything else

> Nodemanagers cannot detect GPU failures
> ---
>
> Key: YARN-10616
> URL: https://issues.apache.org/jira/browse/YARN-10616
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>
> As stated above, the bug is that GPUs can fail, but the NM doesn't notice the 
> failure. The NM will continue to schedule tasks onto the failed GPU, but the 
> GPU won't actually work and so the container will likely fail or run very 
> slowly on the CPU. 
> My initial thought on solving this is to add NM resource capabilities to the 
> NM-RM heartbeat and have the RM update its view of the NM's resource 
> capabilities on each heartbeat. This would be a fairly trivial change, but 
> comes with the unfortunate side effect that it completely undermindes {{yarn 
> rmadmin -updateNodeResource}}. When you run {{-updateNodeResource}} the 
> assumption is that the node will retain these new resource capabilities until 
> either the NM or RM is restarted. But with a heartbeat interaction constantly 
> updating those resource capabilities from the NM perspective, the explicit 
> changes via {{-updateNodeResource}} would be lost on the next heartbeat. We 
> could potentially add a flag to ignore the heartbeat updates for any node who 
> has had {{-updateNodeResource}} called on it (until a re-registration). But 
> in this case, the node would no longer get resource capability updates until 
> the NM or RM restarted. If {{-updateNodeResource}} is used a decent amount, 
> then that would give potentially unexpected behavior in relation to nodes 
> properly auto-detecting failures.
> Another idea is to add a GPU monitor thread on the NM to periodically run 
> {{nvidia-smi}} and detect changes in the number of healthy GPUs. If that 
> number decreased, the node would hook into the health check status and mark 
> itself as unhealthy. The downside of this approach is that a single failed 
> GPU would mean taking out an entire node (e.g. 8 GPUs).
> I would really like to go with the NM-RM heartbeat approach, but the 
> {{-updateNodeResource}} issue bothers me. The second approach is ok I guess, 
> but I also don't like taking down whole GPU nodes when only a single GPU is 
> bad. Would like to hear thoughts of others on how best to approach this
> [~jhung], [~leftnoteasy], [~sunilg], [~epayne], [~Jim_Brennan]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9618) NodeListManager event improvement

2021-03-16 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302860#comment-17302860
 ] 

Eric Badger commented on YARN-9618:
---

The patch looks reasonable to me. Agree with [~gandras] that some stress 
testing should be done before committing

> NodeListManager event improvement
> -
>
> Key: YARN-9618
> URL: https://issues.apache.org/jira/browse/YARN-9618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-9618.001.patch, YARN-9618.002.patch, 
> YARN-9618.003.patch
>
>
> Current implementation nodelistmanager event blocks async dispacher and can 
> cause RM crash and slowing down event processing.
> # Cluster restart with 1K running apps . Each usable event will create 1K 
> events over all events could be 5k*1k events for 5K cluster
> # Event processing is blocked till new events are added to queue.
> Solution :
> # Add another async Event handler similar to scheduler.
> # Instead of adding events to dispatcher directly call RMApp event handler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10652) Capacity Scheduler fails to handle user weights for a user that has a "." (dot) in it

2021-03-16 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302845#comment-17302845
 ] 

Szilard Nemeth commented on YARN-10652:
---

Hi [~sahuja],
Answering your comment from 
[here|https://issues.apache.org/jira/browse/YARN-10652?focusedCommentId=17295634=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17295634].
1. This might be tough to implement but [~pbacsko] and [~shuzirra] know the 
internals of the placement engine better than myself.
2. I think it's okay to have it documented, so I'd choose this from your 
suggestions. Could you please file a jira for this?
3. This is also a good idea. 

Furthermore, can you file a follow-up jira (you can file more if necessary) as 
suggested by [Peter's 
comment|https://issues.apache.org/jira/browse/YARN-10652?focusedCommentId=17295964=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17295964]
 to cover the problematic cases we already discovered during code inspection 
and while having our discussion here?
All in all, if you file follow-up jiras to make this use-case more stable and 
consistent, I'm fine.
So, I'm giving +1 (binding) for your patch.

[~wilfreds] I get your point with the last comment. 
Based on my comment above: As you wanted to commit this in the first place, 
please go ahead with committing.
Thanks.

> Capacity Scheduler fails to handle user weights for a user that has a "." 
> (dot) in it
> -
>
> Key: YARN-10652
> URL: https://issues.apache.org/jira/browse/YARN-10652
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Siddharth Ahuja
>Assignee: Siddharth Ahuja
>Priority: Major
> Attachments: Correct user weight of 0.76 picked up for the user with 
> a dot after the patch.png, Incorrect default user weight of 1.0 being picked 
> for the user with a dot before the patch.png, YARN-10652.001.patch
>
>
> AD usernames can have a "." (dot) in them i.e. they can be of the format -> 
> {{firstname.lastname}}. However, if you specify a username with this format 
> against the Capacity Scheduler setting -> 
> {{yarn.scheduler.capacity.root.default.user-settings.firstname.lastname.weight}},
>  it fails to be applied and is instead assigned the default of 1.0f weight. 
> This renders the user weight feature (being used as a means of setting user 
> priorities for a queue) unusable for such users.
> This limitation comes from [1]. From [1], only word characters (A word 
> character: [a-zA-Z_0-9]) (see [2]) are permissible at the moment which is no 
> good for AD names that contain a "." (dot).
> Similar discussion has been had in a few HADOOP jiras e.g. HADOOP-7050 and 
> HADOOP-15395 and the outcome was to use non-whitespace characters i.e. 
> instead of {{\w+}}, use {{\S+}}.
> We could go down similar path and unblock this feature for the AD usernames 
> with a "." (dot) in them.
> [1] 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java#L1953
> [2] 
> https://docs.oracle.com/javase/tutorial/essential/regex/pre_char_classes.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10501) Can't remove all node labels after add node label without nodemanager port

2021-03-16 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302817#comment-17302817
 ] 

Ahmed Hussein edited comment on YARN-10501 at 3/16/21, 6:55 PM:


That's confusing. I am sure [~aajisaka] has better clue.
branch-2.10 dev-support/Jenkinsfile defines 
{{YETUS_ARGS+=("--findbugs-strict-precheck")}}. I do not know where does 
{{--spotbugs-strict-precheck}} come from on branch-2.10 builds.



was (Author: ahussein):
That's confusing. I am sure [~aajisaka] has better clue.
branch-2.10 -> dev-support/Jenkinsfile defines 
{{YETUS_ARGS+=("--findbugs-strict-precheck")}}. I do not know where does 
{{--spotbugs-strict-precheck}} come from on branch-2.10 builds.


> Can't remove all node labels after add node label without nodemanager port
> --
>
> Key: YARN-10501
> URL: https://issues.apache.org/jira/browse/YARN-10501
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.4.0
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Critical
> Fix For: 3.4.0, 3.3.1, 3.1.5, 3.2.3
>
> Attachments: YARN-10501-branch-2.10.001.patch, YARN-10501.002.patch, 
> YARN-10501.003.patch, YARN-10501.004.patch, YARN-10502-branch-2.10.002.patch, 
> YARN-10502-branch-2.10.003.patch
>
>
> When add a label to nodes without nodemanager port or use WILDCARD_PORT (0) 
> port, it can't remove all label info in these nodes
> Reproduce process:
> {code:java}
> 1.yarn rmadmin -addToClusterNodeLabels "cpunode(exclusive=true)"
> 2.yarn rmadmin -replaceLabelsOnNode "server001=cpunode"
> 3.curl http://RM_IP:8088/ws/v1/cluster/label-mappings
> {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":["server001:0","server001:45454"],"partitionInfo":{"resourceAvailable":{"memory":"510","vCores":"1","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"510"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"1"}]}}}
> 4.yarn rmadmin -replaceLabelsOnNode "server001"
> 5.curl http://RM_IP:8088/ws/v1/cluster/label-mappings
> {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":"server001:45454","partitionInfo":{"resourceAvailable":{"memory":"0","vCores":"0","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"0"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"0"}]}}}
>  {code}
> You can see after the 4 process to remove nodemanager labels, the label info 
> is still in the node info.
> {code:java}
>  641 case REPLACE:
>  642 replaceNodeForLabels(nodeId, host.labels, labels);
>  643 replaceLabelsForNode(nodeId, host.labels, labels);
>  644 host.labels.clear();
>  645 host.labels.addAll(labels);
>  646 for (Node node : host.nms.values()) {
>  647 replaceNodeForLabels(node.nodeId, node.labels, labels);
>  649 node.labels = null;
>  650 }
>  651 break;{code}
> The cause is in 647 line, when add labels to node without port, the 0 port 
> and the real nm port with be both add to node info, and when remove labels, 
> the parameter node.labels in 647 line is null, so it will not remove the old 
> label. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10501) Can't remove all node labels after add node label without nodemanager port

2021-03-16 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302817#comment-17302817
 ] 

Ahmed Hussein commented on YARN-10501:
--

That's confusing. I am sure [~aajisaka] has better clue.
branch-2.10 -> dev-support/Jenkinsfile defines 
{{YETUS_ARGS+=("--findbugs-strict-precheck")}}. I do not know where does 
{{--spotbugs-strict-precheck}} come from on branch-2.10 builds.


> Can't remove all node labels after add node label without nodemanager port
> --
>
> Key: YARN-10501
> URL: https://issues.apache.org/jira/browse/YARN-10501
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.4.0
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Critical
> Fix For: 3.4.0, 3.3.1, 3.1.5, 3.2.3
>
> Attachments: YARN-10501-branch-2.10.001.patch, YARN-10501.002.patch, 
> YARN-10501.003.patch, YARN-10501.004.patch, YARN-10502-branch-2.10.002.patch, 
> YARN-10502-branch-2.10.003.patch
>
>
> When add a label to nodes without nodemanager port or use WILDCARD_PORT (0) 
> port, it can't remove all label info in these nodes
> Reproduce process:
> {code:java}
> 1.yarn rmadmin -addToClusterNodeLabels "cpunode(exclusive=true)"
> 2.yarn rmadmin -replaceLabelsOnNode "server001=cpunode"
> 3.curl http://RM_IP:8088/ws/v1/cluster/label-mappings
> {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":["server001:0","server001:45454"],"partitionInfo":{"resourceAvailable":{"memory":"510","vCores":"1","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"510"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"1"}]}}}
> 4.yarn rmadmin -replaceLabelsOnNode "server001"
> 5.curl http://RM_IP:8088/ws/v1/cluster/label-mappings
> {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":"server001:45454","partitionInfo":{"resourceAvailable":{"memory":"0","vCores":"0","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"0"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"0"}]}}}
>  {code}
> You can see after the 4 process to remove nodemanager labels, the label info 
> is still in the node info.
> {code:java}
>  641 case REPLACE:
>  642 replaceNodeForLabels(nodeId, host.labels, labels);
>  643 replaceLabelsForNode(nodeId, host.labels, labels);
>  644 host.labels.clear();
>  645 host.labels.addAll(labels);
>  646 for (Node node : host.nms.values()) {
>  647 replaceNodeForLabels(node.nodeId, node.labels, labels);
>  649 node.labels = null;
>  650 }
>  651 break;{code}
> The cause is in 647 line, when add labels to node without port, the 0 port 
> and the real nm port with be both add to node info, and when remove labels, 
> the parameter node.labels in 647 line is null, so it will not remove the old 
> label. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10501) Can't remove all node labels after add node label without nodemanager port

2021-03-16 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302782#comment-17302782
 ] 

Eric Badger commented on YARN-10501:


[~aajisaka], [~ahussein], most recent builds are failing due to some yetus flag 
errors. Is this a recent change? Do you know how to mitigate it?

> Can't remove all node labels after add node label without nodemanager port
> --
>
> Key: YARN-10501
> URL: https://issues.apache.org/jira/browse/YARN-10501
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.4.0
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Critical
> Fix For: 3.4.0, 3.3.1, 3.1.5, 3.2.3
>
> Attachments: YARN-10501-branch-2.10.001.patch, YARN-10501.002.patch, 
> YARN-10501.003.patch, YARN-10501.004.patch, YARN-10502-branch-2.10.002.patch, 
> YARN-10502-branch-2.10.003.patch
>
>
> When add a label to nodes without nodemanager port or use WILDCARD_PORT (0) 
> port, it can't remove all label info in these nodes
> Reproduce process:
> {code:java}
> 1.yarn rmadmin -addToClusterNodeLabels "cpunode(exclusive=true)"
> 2.yarn rmadmin -replaceLabelsOnNode "server001=cpunode"
> 3.curl http://RM_IP:8088/ws/v1/cluster/label-mappings
> {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":["server001:0","server001:45454"],"partitionInfo":{"resourceAvailable":{"memory":"510","vCores":"1","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"510"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"1"}]}}}
> 4.yarn rmadmin -replaceLabelsOnNode "server001"
> 5.curl http://RM_IP:8088/ws/v1/cluster/label-mappings
> {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":"server001:45454","partitionInfo":{"resourceAvailable":{"memory":"0","vCores":"0","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"0"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"0"}]}}}
>  {code}
> You can see after the 4 process to remove nodemanager labels, the label info 
> is still in the node info.
> {code:java}
>  641 case REPLACE:
>  642 replaceNodeForLabels(nodeId, host.labels, labels);
>  643 replaceLabelsForNode(nodeId, host.labels, labels);
>  644 host.labels.clear();
>  645 host.labels.addAll(labels);
>  646 for (Node node : host.nms.values()) {
>  647 replaceNodeForLabels(node.nodeId, node.labels, labels);
>  649 node.labels = null;
>  650 }
>  651 break;{code}
> The cause is in 647 line, when add labels to node without port, the 0 port 
> and the real nm port with be both add to node info, and when remove labels, 
> the parameter node.labels in 647 line is null, so it will not remove the old 
> label. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10495) make the rpath of container-executor configurable

2021-03-16 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302761#comment-17302761
 ] 

Eric Badger commented on YARN-10495:


[~angerszhu], I don't think it's a good idea to ship glibc with Hadoop. glibc 
is tied very closely to the kernel and if the ABI has changed then it won't 
work. 

> make the rpath of container-executor configurable
> -
>
> Key: YARN-10495
> URL: https://issues.apache.org/jira/browse/YARN-10495
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.4.0, 3.3.1
>
> Attachments: YARN-10495.001.patch, YARN-10495.002.patch
>
>
> In  https://issues.apache.org/jira/browse/YARN-9561 we add dependency on 
> crypto to container-executor, we meet a case that in our jenkins machine, we 
> have libcrypto.so.1.0.0  in shared lib env. but in our nodemanager machine we 
> don't have  libcrypto.so.1.0.0  but *libcrypto.so.1.1.*
> We use a  internal custom dynamic link library environment 
> /usr/lib/x86_64-linux-gnu
> and we build hadoop with parameter as blow
> {code:java}
>  -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu
> {code}
>  
> Under jenkins machine shared lib library path /usr/lib/x86_64-linux-gun(where 
> is libcrypto)
> {code:java}
> -rw-r--r-- 1 root root   240136 Nov 28  2014 libcroco-0.6.so.3.0.1
> -rw-r--r-- 1 root root54550 Jun 18  2017 libcrypt.a
> -rw-r--r-- 1 root root  4306444 Sep 26  2019 libcrypto.a
> lrwxrwxrwx 1 root root   18 Sep 26  2019 libcrypto.so -> 
> libcrypto.so.1.0.0
> -rw-r--r-- 1 root root  2070976 Sep 26  2019 libcrypto.so.1.0.0
> lrwxrwxrwx 1 root root   35 Jun 18  2017 libcrypt.so -> 
> /lib/x86_64-linux-gnu/libcrypt.so.1
> -rw-r--r-- 1 root root  298 Jun 18  2017 libc.so
> {code}
>  
> Under nodemanager shared lib library path /usr/lib/x86_64-linux-gun(where is 
> libcrypto)
> {code:java}
> -rw-r--r--  1 root root55852 2��   7  2019 libcrypt.a
> -rw-r--r--  1 root root  4864244 9��  28  2019 libcrypto.a
> lrwxrwxrwx  1 root root   16 9��  28  2019 libcrypto.so -> 
> libcrypto.so.1.1
> -rw-r--r--  1 root root  2504576 12�� 24  2019 libcrypto.so.1.0.2
> -rw-r--r--  1 root root  2715840 9��  28  2019 libcrypto.so.1.1
> lrwxrwxrwx  1 root root   35 2��   7  2019 libcrypt.so -> 
> /lib/x86_64-linux-gnu/libcrypt.so.1
> -rw-r--r--  1 root root  298 2��   7  2019 libc.so
> {code}
>  We build container-executor with 
> The  libcrypto.so 's version is not same case error when we start nodemanager
>  
> {code:java}
> .. 3 more Caused by: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=127: /home/hadoop/hadoop/bin/container-executor: 
> error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared 
> object file: No such file or directory at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:182)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:208)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:306)
>  ... 4 more Caused by: ExitCodeException exitCode=127: 
> /home/hadoop/hadoop/bin/container-executor: error while loading shared 
> libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file 
> or directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008) at 
> org.apache.hadoop.util.Shell.run(Shell.java:901) at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:154)
>  ... 6 more 
> {code}
>  
> We should make RPATH of container-executor configurable to solve this problem 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1151) Ability to configure auxiliary services from HDFS-based JAR files

2021-03-16 Thread Haibo Chen (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302759#comment-17302759
 ] 

Haibo Chen commented on YARN-1151:
--

cherry-picked to branch-2.10 

> Ability to configure auxiliary services from HDFS-based JAR files
> -
>
> Key: YARN-1151
> URL: https://issues.apache.org/jira/browse/YARN-1151
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.1.0-beta, 2.9.0
>Reporter: john lilley
>Assignee: Xuan Gong
>Priority: Major
>  Labels: auxiliary-service, yarn
> Fix For: 3.2.0, 3.1.1, 2.10.2
>
> Attachments: YARN-1151.1.patch, YARN-1151.2.patch, YARN-1151.3.patch, 
> YARN-1151.4.patch, YARN-1151.5.patch, YARN-1151.6.patch, 
> YARN-1151.branch-2.poc.2.patch, YARN-1151.branch-2.poc.3.patch, 
> YARN-1151.branch-2.poc.patch, [YARN-1151] [Design] Configure auxiliary 
> services from HDFS-based JAR files.pdf
>
>
> I would like to install an auxiliary service in Hadoop YARN without actually 
> installing files/services on every node in the system.  Discussions on the 
> user@ list indicate that this is not easily done.  The reason we want an 
> auxiliary service is that our application has some persistent-data components 
> that are not appropriate for HDFS.  In fact, they are somewhat analogous to 
> the mapper output of MapReduce's shuffle, which is what led me to 
> auxiliary-services in the first place.  It would be much easier if we could 
> just place our service's JARs in HDFS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1151) Ability to configure auxiliary services from HDFS-based JAR files

2021-03-16 Thread Haibo Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-1151:
-
Fix Version/s: 2.10.2

> Ability to configure auxiliary services from HDFS-based JAR files
> -
>
> Key: YARN-1151
> URL: https://issues.apache.org/jira/browse/YARN-1151
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.1.0-beta, 2.9.0
>Reporter: john lilley
>Assignee: Xuan Gong
>Priority: Major
>  Labels: auxiliary-service, yarn
> Fix For: 3.2.0, 3.1.1, 2.10.2
>
> Attachments: YARN-1151.1.patch, YARN-1151.2.patch, YARN-1151.3.patch, 
> YARN-1151.4.patch, YARN-1151.5.patch, YARN-1151.6.patch, 
> YARN-1151.branch-2.poc.2.patch, YARN-1151.branch-2.poc.3.patch, 
> YARN-1151.branch-2.poc.patch, [YARN-1151] [Design] Configure auxiliary 
> services from HDFS-based JAR files.pdf
>
>
> I would like to install an auxiliary service in Hadoop YARN without actually 
> installing files/services on every node in the system.  Discussions on the 
> user@ list indicate that this is not easily done.  The reason we want an 
> auxiliary service is that our application has some persistent-data components 
> that are not appropriate for HDFS.  In fact, they are somewhat analogous to 
> the mapper output of MapReduce's shuffle, which is what led me to 
> auxiliary-services in the first place.  It would be much easier if we could 
> just place our service's JARs in HDFS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.

2021-03-16 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302754#comment-17302754
 ] 

Hadoop QA commented on YARN-10674:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
30s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 2 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
32s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 14s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
43s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 20m 
35s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  2m  
1s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
50s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
59s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
49s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 41s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/803/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color}
 | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 13 unchanged - 7 fixed = 14 total (was 20) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 59s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} 

[jira] [Commented] (YARN-10698) Backport YARN-1151 (load auxiliary service from HDFS archives) to branch-2.10

2021-03-16 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302681#comment-17302681
 ] 

Hadoop QA commented on YARN-10698:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
36s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} yetus {color} | {color:red}  0m 14s{color} 
| {color:red}{color} | {color:red} Unprocessed flag(s): 
--spotbugs-strict-precheck {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/804/artifact/out/Dockerfile
 |
| JIRA Issue | YARN-10698 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13022399/YARN-10698.branch-2.10.00.patch
 |
| Console output | 
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/804/console |
| versions | git=2.7.4 |
| Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |


This message was automatically generated.



> Backport YARN-1151 (load auxiliary service from HDFS archives) to branch-2.10
> -
>
> Key: YARN-10698
> URL: https://issues.apache.org/jira/browse/YARN-10698
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.10.1
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-10698.branch-2.10.00.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10698) Backport YARN-1151 (load auxiliary service from HDFS archives) to branch-2.10

2021-03-16 Thread Haibo Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-10698:
--
Attachment: YARN-10698.branch-2.10.00.patch

> Backport YARN-1151 (load auxiliary service from HDFS archives) to branch-2.10
> -
>
> Key: YARN-10698
> URL: https://issues.apache.org/jira/browse/YARN-10698
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.10.1
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-10698.branch-2.10.00.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10698) Backport YARN-1151 (load auxiliary service from HDFS archives) to branch-2.10

2021-03-16 Thread Haibo Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-10698:
--
Target Version/s: 2.10.2

> Backport YARN-1151 (load auxiliary service from HDFS archives) to branch-2.10
> -
>
> Key: YARN-10698
> URL: https://issues.apache.org/jira/browse/YARN-10698
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.10.1
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9618) NodeListManager event improvement

2021-03-16 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302670#comment-17302670
 ] 

Hadoop QA commented on YARN-9618:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
30s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 2 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
35s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
47s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
59s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 58s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 21m 
18s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  1m 
55s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
58s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
46s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green}{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 0 new + 58 unchanged - 1 fixed = 58 total (was 59) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
51s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 15s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| 

[jira] [Updated] (YARN-10698) Backport YARN-1151 (load auxiliary service from HDFS archives) to branch-2.10

2021-03-16 Thread Haibo Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-10698:
--
Affects Version/s: 2.10.1

> Backport YARN-1151 (load auxiliary service from HDFS archives) to branch-2.10
> -
>
> Key: YARN-10698
> URL: https://issues.apache.org/jira/browse/YARN-10698
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.10.1
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10698) Backport YARN-1151 (load auxiliary service from HDFS archives) to branch-2.10

2021-03-16 Thread Haibo Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-10698:
--
Summary: Backport YARN-1151 (load auxiliary service from HDFS archives) to 
branch-2.10  (was: Backport YARN-1151 (load auxiliary service from HDFS 
archives) to branch-2)

> Backport YARN-1151 (load auxiliary service from HDFS archives) to branch-2.10
> -
>
> Key: YARN-10698
> URL: https://issues.apache.org/jira/browse/YARN-10698
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9618) NodeListManager event improvement

2021-03-16 Thread Andras Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302660#comment-17302660
 ] 

Andras Gyori commented on YARN-9618:


Thanks [~zhuqi] for the patch, it seems to be a good scalability improvement. I 
think it has relatively low risk, as dispatching events to its own handlers is 
a common idiom in ResourceManager. This is, however, affects a core part of 
YARN, so we need to be careful here. My addition to the issue:
 * Please use full types everywhere, like 
EventDispatcher, or use wildcard, if the type is unknown.

> NodeListManager event improvement
> -
>
> Key: YARN-9618
> URL: https://issues.apache.org/jira/browse/YARN-9618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-9618.001.patch, YARN-9618.002.patch, 
> YARN-9618.003.patch
>
>
> Current implementation nodelistmanager event blocks async dispacher and can 
> cause RM crash and slowing down event processing.
> # Cluster restart with 1K running apps . Each usable event will create 1K 
> events over all events could be 5k*1k events for 5K cluster
> # Event processing is blocked till new events are added to queue.
> Solution :
> # Add another async Event handler similar to scheduler.
> # Instead of adding events to dispatcher directly call RMApp event handler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10698) Backport YARN-1151 (load auxiliary service from HDFS archives) to branch-2

2021-03-16 Thread Haibo Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-10698:
--
Target Version/s:   (was: 2.10.2)

> Backport YARN-1151 (load auxiliary service from HDFS archives) to branch-2
> --
>
> Key: YARN-10698
> URL: https://issues.apache.org/jira/browse/YARN-10698
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10698) Backport YARN-1151 (load auxiliary service from HDFS archives) to branch-2

2021-03-16 Thread Haibo Chen (Jira)
Haibo Chen created YARN-10698:
-

 Summary: Backport YARN-1151 (load auxiliary service from HDFS 
archives) to branch-2
 Key: YARN-10698
 URL: https://issues.apache.org/jira/browse/YARN-10698
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10686) Fix TestCapacitySchedulerAutoQueueCreation#testAutoQueueCreationFailsForEmptyPathWithAQCAndWeightMode

2021-03-16 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302599#comment-17302599
 ] 

Peter Bacsko commented on YARN-10686:
-

+1

Thanks [~zhuqi] for the patch and [~gandras] for the review. Committed to trunk.

> Fix 
> TestCapacitySchedulerAutoQueueCreation#testAutoQueueCreationFailsForEmptyPathWithAQCAndWeightMode
> -
>
> Key: YARN-10686
> URL: https://issues.apache.org/jira/browse/YARN-10686
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10686.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.

2021-03-16 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302595#comment-17302595
 ] 

Qi Zhu commented on YARN-10674:
---

Thanks [~pbacsko] for valid suggestion.

Updated this in latest patch.:D

> fs2cs: should support auto created queue deletion.
> --
>
> Key: YARN-10674
> URL: https://issues.apache.org/jira/browse/YARN-10674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10674.001.patch, YARN-10674.002.patch, 
> YARN-10674.003.patch, YARN-10674.004.patch, YARN-10674.005.patch, 
> YARN-10674.006.patch, YARN-10674.007.patch, YARN-10674.008.patch, 
> YARN-10674.009.patch, YARN-10674.010.patch, YARN-10674.011.patch
>
>
> In FS the auto deletion check interval is 10s.
> {code:java}
> @Override
> public void onCheck() {
>   queueMgr.removeEmptyDynamicQueues();
>   queueMgr.removePendingIncompatibleQueues();
> }
> while (running) {
>   try {
> synchronized (this) {
>   reloadListener.onCheck();
> }
> ...
> Thread.sleep(reloadIntervalMs);
> }
> /** Time to wait between checks of the allocation file */
> public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10686) Fix TestCapacitySchedulerAutoQueueCreation#testAutoQueueCreationFailsForEmptyPathWithAQCAndWeightMode

2021-03-16 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10686:

Summary: Fix 
TestCapacitySchedulerAutoQueueCreation#testAutoQueueCreationFailsForEmptyPathWithAQCAndWeightMode
  (was: Fix testAutoQueueCreationFailsForEmptyPathWithAQCAndWeightMode user 
error.)

> Fix 
> TestCapacitySchedulerAutoQueueCreation#testAutoQueueCreationFailsForEmptyPathWithAQCAndWeightMode
> -
>
> Key: YARN-10686
> URL: https://issues.apache.org/jira/browse/YARN-10686
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10686.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10674) fs2cs: should support auto created queue deletion.

2021-03-16 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10674:
--
Attachment: YARN-10674.011.patch

> fs2cs: should support auto created queue deletion.
> --
>
> Key: YARN-10674
> URL: https://issues.apache.org/jira/browse/YARN-10674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10674.001.patch, YARN-10674.002.patch, 
> YARN-10674.003.patch, YARN-10674.004.patch, YARN-10674.005.patch, 
> YARN-10674.006.patch, YARN-10674.007.patch, YARN-10674.008.patch, 
> YARN-10674.009.patch, YARN-10674.010.patch, YARN-10674.011.patch
>
>
> In FS the auto deletion check interval is 10s.
> {code:java}
> @Override
> public void onCheck() {
>   queueMgr.removeEmptyDynamicQueues();
>   queueMgr.removePendingIncompatibleQueues();
> }
> while (running) {
>   try {
> synchronized (this) {
>   reloadListener.onCheck();
> }
> ...
> Thread.sleep(reloadIntervalMs);
> }
> /** Time to wait between checks of the allocation file */
> public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10659) Improve CS MappingRule %secondary_group evaluation

2021-03-16 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302587#comment-17302587
 ] 

Hadoop QA commented on YARN-10659:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
58s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 1 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
46s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 29s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 20m 
38s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  1m 
52s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
48s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
53s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 42s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/801/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color}
 | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 6 unchanged - 1 fixed = 7 total (was 7) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
4s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m  1s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:red}-1{color} | 

[jira] [Commented] (YARN-10682) The scheduler monitor policies conf should trim values separated by comma

2021-03-16 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302567#comment-17302567
 ] 

Peter Bacsko commented on YARN-10682:
-

+1

Thanks for the patch [~zhuqi] and [~gandras] for the review, committed to trunk.

> The scheduler monitor policies conf should trim values separated by comma
> -
>
> Key: YARN-10682
> URL: https://issues.apache.org/jira/browse/YARN-10682
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10682.001.patch
>
>
> When i configured scheduler monitor policies with space, the RM will start 
> with error.
> The conf should support trim between "," , such as :
> "a,b,c" is supported now, but "a,   b,  c" is not supported now, just add 
> trim in this jira.
>  
> When tested multi policy, it happened.
>  
>  yarn.resourcemanager.scheduler.monitor.policies
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.QueueConfigurationAutoRefreshPolicy,
>    
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedQueueDeletionPolicy
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10682) The scheduler monitor policies conf should trim values separated by comma

2021-03-16 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10682:

Summary: The scheduler monitor policies conf should trim values separated 
by comma  (was: The scheduler monitor policies conf should support trim between 
",".)

> The scheduler monitor policies conf should trim values separated by comma
> -
>
> Key: YARN-10682
> URL: https://issues.apache.org/jira/browse/YARN-10682
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10682.001.patch
>
>
> When i configured scheduler monitor policies with space, the RM will start 
> with error.
> The conf should support trim between "," , such as :
> "a,b,c" is supported now, but "a,   b,  c" is not supported now, just add 
> trim in this jira.
>  
> When tested multi policy, it happened.
>  
>  yarn.resourcemanager.scheduler.monitor.policies
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.QueueConfigurationAutoRefreshPolicy,
>    
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedQueueDeletionPolicy
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.

2021-03-16 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302548#comment-17302548
 ] 

Peter Bacsko commented on YARN-10674:
-

Thanks [~zhuqi] this is definitely looks better. We're close to the final 
version.

Some comments:
 1.
{noformat}
Disable the preemption with nopolicy or observeonly mode, " +
"default mode is nopolicy with no arg." +
"When use nopolicy arg, it means to remove " +
"ProportionalCapacityPreemptionPolicy for CS preemption, " +
"When use observeonly arg, " +
"it means to set " +

"yarn.resourcemanager.monitor.capacity.preemption.observe_only " +
"to true"
{noformat}
I'd to slightly modify this text:
{noformat}
Disable the preemption with \"nopolicy\" or \"observeonly\" mode.
Default is \"nopolicy\".
\"nopolicy\" removes ProportionalCapacityPreemptionPolicy from
the list of monitor policies.
\"observeronly\" sets 
\"yarn.resourcemanager.monitor.capacity.preemption.observe_only\"
to true.
{noformat}
2. This definition:
 {{private String disablePreemptionMode;}}

This should be a simple enum like:
{noformat}
public enum DisablePreemptionMode {
  OBSERVE_ONLY {
@Override
String getCliOption() {
  return "observeonly";
}
  },
  NO_POLICY {
@Override
String getCliOption() {
  return "nopolicy";
}
  };

  abstract String getCliOption();
}
{noformat}
So you can also use them here:
{noformat}
 private static void checkDisablePreemption(CliOption cliOption,
  String disablePreemptionMode) {
if (disablePreemptionMode == null ||
disablePreemptionMode.trim().isEmpty()) {
  // The default mode is nopolicy.
  return;
}

try {
  DisablePreemptionMode.valueOf(disablePreemptionMode);
} catch (IllegalArgumentException e) {
  throw new PreconditionException(
  String.format("Specified disable-preemption option %s is 
illegal, " +
  " use \"nopolicy\" or \"observeonly\""));
}
{noformat}
"disablePreemptionMode" should be an enum everywhere.

3.
{noformat}
  public void convertSiteProperties(Configuration conf,
  Configuration yarnSiteConfig, boolean drfUsed, boolean 
enableAsyncScheduler) 
  boolean enableAsyncScheduler, boolean userPercentage,
  boolean disablePreemption, String disablePreemptionMode) {
{noformat}
Here "disablePreemptionMode" should be an enum also and make sure that it 
always has a value. If it always has a value, this part becomes much simpler:
{noformat}
  if (disablePreemption && 
  disablePreemptionMode == DisablePreemptionMode.NO_POLICY) {

yarnSiteConfig.set(YarnConfiguration.RM_SCHEDULER_MONITOR_POLICIES, "");
  }
}
{noformat}
4.
 {{AutoCreatedQueueDeletionPolicy.class.getCanonicalName())}}

This string is referenced very often in the tests. Instead, use a final String:
{noformat}
private static final String DELETION_POLICY_CLASS =
   AutoCreatedQueueDeletionPolicy.class.getCanonicalName();
{noformat}
So the readability becomes much better.

> fs2cs: should support auto created queue deletion.
> --
>
> Key: YARN-10674
> URL: https://issues.apache.org/jira/browse/YARN-10674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10674.001.patch, YARN-10674.002.patch, 
> YARN-10674.003.patch, YARN-10674.004.patch, YARN-10674.005.patch, 
> YARN-10674.006.patch, YARN-10674.007.patch, YARN-10674.008.patch, 
> YARN-10674.009.patch, YARN-10674.010.patch
>
>
> In FS the auto deletion check interval is 10s.
> {code:java}
> @Override
> public void onCheck() {
>   queueMgr.removeEmptyDynamicQueues();
>   queueMgr.removePendingIncompatibleQueues();
> }
> while (running) {
>   try {
> synchronized (this) {
>   reloadListener.onCheck();
> }
> ...
> Thread.sleep(reloadIntervalMs);
> }
> /** Time to wait between checks of the allocation file */
> public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9618) NodeListManager event improvement

2021-03-16 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-9618:
-
Attachment: YARN-9618.003.patch

> NodeListManager event improvement
> -
>
> Key: YARN-9618
> URL: https://issues.apache.org/jira/browse/YARN-9618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-9618.001.patch, YARN-9618.002.patch, 
> YARN-9618.003.patch
>
>
> Current implementation nodelistmanager event blocks async dispacher and can 
> cause RM crash and slowing down event processing.
> # Cluster restart with 1K running apps . Each usable event will create 1K 
> events over all events could be 5k*1k events for 5K cluster
> # Event processing is blocked till new events are added to queue.
> Solution :
> # Add another async Event handler similar to scheduler.
> # Instead of adding events to dispatcher directly call RMApp event handler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9618) NodeListManager event improvement

2021-03-16 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302517#comment-17302517
 ] 

Qi Zhu commented on YARN-9618:
--

Fixed test and checkstyle in latest patch. :D

> NodeListManager event improvement
> -
>
> Key: YARN-9618
> URL: https://issues.apache.org/jira/browse/YARN-9618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-9618.001.patch, YARN-9618.002.patch, 
> YARN-9618.003.patch
>
>
> Current implementation nodelistmanager event blocks async dispacher and can 
> cause RM crash and slowing down event processing.
> # Cluster restart with 1K running apps . Each usable event will create 1K 
> events over all events could be 5k*1k events for 5K cluster
> # Event processing is blocked till new events are added to queue.
> Solution :
> # Add another async Event handler similar to scheduler.
> # Instead of adding events to dispatcher directly call RMApp event handler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9618) NodeListManager event improvement

2021-03-16 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302478#comment-17302478
 ] 

Hadoop QA commented on YARN-9618:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
18s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 2 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
34s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 54s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
41s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 20m  
2s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  1m 
52s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
48s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 40s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/800/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color}
 | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 11 new + 58 unchanged - 1 fixed = 69 total (was 59) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 55s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | 

[jira] [Created] (YARN-10697) Resources are displayed in bytes in UI for schedulers other than capacity

2021-03-16 Thread Bilwa S T (Jira)
Bilwa S T created YARN-10697:


 Summary: Resources are displayed in bytes in UI for schedulers 
other than capacity
 Key: YARN-10697
 URL: https://issues.apache.org/jira/browse/YARN-10697
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bilwa S T
Assignee: Bilwa S T


Resources.newInstance expects MB as memory whereas in MetricsOverviewTable 
passes resources in bytes . Also we should display memory in GB for better 
readability for user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10659) Improve CS MappingRule %secondary_group evaluation

2021-03-16 Thread Gergely Pollak (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergely Pollak updated YARN-10659:
--
Attachment: YARN-10659.002.patch

> Improve CS MappingRule %secondary_group evaluation
> --
>
> Key: YARN-10659
> URL: https://issues.apache.org/jira/browse/YARN-10659
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-10659.001.patch, YARN-10659.002.patch
>
>
> Since the leaf queue names are not unique, there are a lot of use cases where 
> %secondary_group evaluation fail, or behave inconsistently.
> We should extend it's behavior, when it's under a defined parent, 
> %secondary_group evaluation should only check for queue existence under that 
> queue. Egy root.group.%secondary_group, should only evaluate to groups which 
> exist under root.group, while the legacy %secondary_group.%user should still 
> look for groups by their leaf name globally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10696) Add RMNodeEvent to single async dispatcher before YARN-9927.

2021-03-16 Thread Qi Zhu (Jira)
Qi Zhu created YARN-10696:
-

 Summary: Add RMNodeEvent to single async dispatcher before 
YARN-9927.
 Key: YARN-10696
 URL: https://issues.apache.org/jira/browse/YARN-10696
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Qi Zhu
Assignee: Qi Zhu


According to YARN-9927 analysis.

The RMNodeStatusEvent dominates 90% time consumption of RM event scheduler.

We'd better to add RMNodeEvent to a separate async Event handler similar to 
YARN-9618.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10690) GPU related improvement for better usage.

2021-03-16 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10690:
--
Description: 
This Jira will improve GPU for better usage.

 cc [~bibinchundatt] [~pbacsko] [~ebadger] [~ztang]  [~epayne] [~gandras]  
[~bteke]

  was:
This Jira will improve GPU for better usage.

 


> GPU related improvement for better usage.
> -
>
> Key: YARN-10690
> URL: https://issues.apache.org/jira/browse/YARN-10690
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>
> This Jira will improve GPU for better usage.
>  cc [~bibinchundatt] [~pbacsko] [~ebadger] [~ztang]  [~epayne] [~gandras]  
> [~bteke]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10695) Event related improvement of YARN for better usage.

2021-03-16 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10695:
--
Description: 
This jira, marked the event related improvement in yarn for better usage.

 cc [~bibinchundatt] [~pbacsko] [~ebadger] [~epayne] [~gandras]  [~bteke]

  was:
This jira, marked the event related improvement in yarn for better usage.

 cc 


> Event related improvement of YARN for better usage.
> ---
>
> Key: YARN-10695
> URL: https://issues.apache.org/jira/browse/YARN-10695
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>
> This jira, marked the event related improvement in yarn for better usage.
>  cc [~bibinchundatt] [~pbacsko] [~ebadger] [~epayne] [~gandras]  [~bteke]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10695) Event related improvement of YARN for better usage.

2021-03-16 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10695:
--
Description: 
This jira, marked the event related improvement in yarn for better usage.

 cc [~bibinchundatt] [~pbacsko] [~ebadger] [~ztang]  [~epayne] [~gandras]  
[~bteke]

  was:
This jira, marked the event related improvement in yarn for better usage.

 cc [~bibinchundatt] [~pbacsko] [~ebadger] [~epayne] [~gandras]  [~bteke]


> Event related improvement of YARN for better usage.
> ---
>
> Key: YARN-10695
> URL: https://issues.apache.org/jira/browse/YARN-10695
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>
> This jira, marked the event related improvement in yarn for better usage.
>  cc [~bibinchundatt] [~pbacsko] [~ebadger] [~ztang]  [~epayne] [~gandras]  
> [~bteke]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10695) Event related improvement of YARN for better usage.

2021-03-16 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10695:
--
Description: 
This jira, marked the event related improvement in yarn for better usage.

 cc 

  was:
This jira, marked the event related improvement in yarn for better usage.

 


> Event related improvement of YARN for better usage.
> ---
>
> Key: YARN-10695
> URL: https://issues.apache.org/jira/browse/YARN-10695
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>
> This jira, marked the event related improvement in yarn for better usage.
>  cc 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9618) NodeListManager event improvement

2021-03-16 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302371#comment-17302371
 ] 

Qi Zhu commented on YARN-9618:
--

[~bibinchundatt] [~pbacsko] [~ebadger] [~epayne] [~gandras]  [~bteke]

Could you help review this?

I updated a patch:
 # Added another async Event handler similar to scheduler.
 # Instead of adding events to dispatcher directly call RMApp event handler.

Thanks.

> NodeListManager event improvement
> -
>
> Key: YARN-9618
> URL: https://issues.apache.org/jira/browse/YARN-9618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-9618.001.patch, YARN-9618.002.patch
>
>
> Current implementation nodelistmanager event blocks async dispacher and can 
> cause RM crash and slowing down event processing.
> # Cluster restart with 1K running apps . Each usable event will create 1K 
> events over all events could be 5k*1k events for 5K cluster
> # Event processing is blocked till new events are added to queue.
> Solution :
> # Add another async Event handler similar to scheduler.
> # Instead of adding events to dispatcher directly call RMApp event handler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9618) NodeListManager event improvement

2021-03-16 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-9618:
-
Attachment: YARN-9618.002.patch

> NodeListManager event improvement
> -
>
> Key: YARN-9618
> URL: https://issues.apache.org/jira/browse/YARN-9618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-9618.001.patch, YARN-9618.002.patch
>
>
> Current implementation nodelistmanager event blocks async dispacher and can 
> cause RM crash and slowing down event processing.
> # Cluster restart with 1K running apps . Each usable event will create 1K 
> events over all events could be 5k*1k events for 5K cluster
> # Event processing is blocked till new events are added to queue.
> Solution :
> # Add another async Event handler similar to scheduler.
> # Instead of adding events to dispatcher directly call RMApp event handler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10501) Can't remove all node labels after add node label without nodemanager port

2021-03-16 Thread caozhiqiang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302359#comment-17302359
 ] 

caozhiqiang commented on YARN-10501:


[~ebadger], branch-2.10 also build failed. Do you known where I make a mistake?

> Can't remove all node labels after add node label without nodemanager port
> --
>
> Key: YARN-10501
> URL: https://issues.apache.org/jira/browse/YARN-10501
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.4.0
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Critical
> Fix For: 3.4.0, 3.3.1, 3.1.5, 3.2.3
>
> Attachments: YARN-10501-branch-2.10.001.patch, YARN-10501.002.patch, 
> YARN-10501.003.patch, YARN-10501.004.patch, YARN-10502-branch-2.10.002.patch, 
> YARN-10502-branch-2.10.003.patch
>
>
> When add a label to nodes without nodemanager port or use WILDCARD_PORT (0) 
> port, it can't remove all label info in these nodes
> Reproduce process:
> {code:java}
> 1.yarn rmadmin -addToClusterNodeLabels "cpunode(exclusive=true)"
> 2.yarn rmadmin -replaceLabelsOnNode "server001=cpunode"
> 3.curl http://RM_IP:8088/ws/v1/cluster/label-mappings
> {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":["server001:0","server001:45454"],"partitionInfo":{"resourceAvailable":{"memory":"510","vCores":"1","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"510"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"1"}]}}}
> 4.yarn rmadmin -replaceLabelsOnNode "server001"
> 5.curl http://RM_IP:8088/ws/v1/cluster/label-mappings
> {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":"server001:45454","partitionInfo":{"resourceAvailable":{"memory":"0","vCores":"0","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"0"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"0"}]}}}
>  {code}
> You can see after the 4 process to remove nodemanager labels, the label info 
> is still in the node info.
> {code:java}
>  641 case REPLACE:
>  642 replaceNodeForLabels(nodeId, host.labels, labels);
>  643 replaceLabelsForNode(nodeId, host.labels, labels);
>  644 host.labels.clear();
>  645 host.labels.addAll(labels);
>  646 for (Node node : host.nms.values()) {
>  647 replaceNodeForLabels(node.nodeId, node.labels, labels);
>  649 node.labels = null;
>  650 }
>  651 break;{code}
> The cause is in 647 line, when add labels to node without port, the 0 port 
> and the real nm port with be both add to node info, and when remove labels, 
> the parameter node.labels in 647 line is null, so it will not remove the old 
> label. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10501) Can't remove all node labels after add node label without nodemanager port

2021-03-16 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302349#comment-17302349
 ] 

Hadoop QA commented on YARN-10501:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
20s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} yetus {color} | {color:red}  0m 15s{color} 
| {color:red}{color} | {color:red} Unprocessed flag(s): 
--spotbugs-strict-precheck {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/799/artifact/out/Dockerfile
 |
| JIRA Issue | YARN-10501 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13022368/YARN-10502-branch-2.10.003.patch
 |
| Console output | 
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/799/console |
| versions | git=2.7.4 |
| Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |


This message was automatically generated.



> Can't remove all node labels after add node label without nodemanager port
> --
>
> Key: YARN-10501
> URL: https://issues.apache.org/jira/browse/YARN-10501
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.4.0
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Critical
> Fix For: 3.4.0, 3.3.1, 3.1.5, 3.2.3
>
> Attachments: YARN-10501-branch-2.10.001.patch, YARN-10501.002.patch, 
> YARN-10501.003.patch, YARN-10501.004.patch, YARN-10502-branch-2.10.002.patch, 
> YARN-10502-branch-2.10.003.patch
>
>
> When add a label to nodes without nodemanager port or use WILDCARD_PORT (0) 
> port, it can't remove all label info in these nodes
> Reproduce process:
> {code:java}
> 1.yarn rmadmin -addToClusterNodeLabels "cpunode(exclusive=true)"
> 2.yarn rmadmin -replaceLabelsOnNode "server001=cpunode"
> 3.curl http://RM_IP:8088/ws/v1/cluster/label-mappings
> {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":["server001:0","server001:45454"],"partitionInfo":{"resourceAvailable":{"memory":"510","vCores":"1","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"510"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"1"}]}}}
> 4.yarn rmadmin -replaceLabelsOnNode "server001"
> 5.curl http://RM_IP:8088/ws/v1/cluster/label-mappings
> {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":"server001:45454","partitionInfo":{"resourceAvailable":{"memory":"0","vCores":"0","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"0"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"0"}]}}}
>  {code}
> You can see after the 4 process to remove nodemanager labels, the label info 
> is still in the node info.
> {code:java}
>  641 case REPLACE:
>  642 replaceNodeForLabels(nodeId, host.labels, labels);
>  643 replaceLabelsForNode(nodeId, host.labels, labels);
>  644 host.labels.clear();
>  645 host.labels.addAll(labels);
>  646 for (Node node : host.nms.values()) {
>  647 replaceNodeForLabels(node.nodeId, node.labels, labels);
>  649 node.labels = null;
>  650 }
>  651 break;{code}
> The cause is in 647 line, when add labels to node without port, the 0 port 
> and the real nm port with be both add to node info, and when remove labels, 
> the parameter node.labels in 647 line is null, so it will not remove the old 
> label. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10501) Can't remove all node labels after add node label without nodemanager port

2021-03-16 Thread caozhiqiang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

caozhiqiang updated YARN-10501:
---
Attachment: YARN-10502-branch-2.10.003.patch

> Can't remove all node labels after add node label without nodemanager port
> --
>
> Key: YARN-10501
> URL: https://issues.apache.org/jira/browse/YARN-10501
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.4.0
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Critical
> Fix For: 3.4.0, 3.3.1, 3.1.5, 3.2.3
>
> Attachments: YARN-10501-branch-2.10.001.patch, YARN-10501.002.patch, 
> YARN-10501.003.patch, YARN-10501.004.patch, YARN-10502-branch-2.10.002.patch, 
> YARN-10502-branch-2.10.003.patch
>
>
> When add a label to nodes without nodemanager port or use WILDCARD_PORT (0) 
> port, it can't remove all label info in these nodes
> Reproduce process:
> {code:java}
> 1.yarn rmadmin -addToClusterNodeLabels "cpunode(exclusive=true)"
> 2.yarn rmadmin -replaceLabelsOnNode "server001=cpunode"
> 3.curl http://RM_IP:8088/ws/v1/cluster/label-mappings
> {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":["server001:0","server001:45454"],"partitionInfo":{"resourceAvailable":{"memory":"510","vCores":"1","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"510"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"1"}]}}}
> 4.yarn rmadmin -replaceLabelsOnNode "server001"
> 5.curl http://RM_IP:8088/ws/v1/cluster/label-mappings
> {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":"server001:45454","partitionInfo":{"resourceAvailable":{"memory":"0","vCores":"0","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"0"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"0"}]}}}
>  {code}
> You can see after the 4 process to remove nodemanager labels, the label info 
> is still in the node info.
> {code:java}
>  641 case REPLACE:
>  642 replaceNodeForLabels(nodeId, host.labels, labels);
>  643 replaceLabelsForNode(nodeId, host.labels, labels);
>  644 host.labels.clear();
>  645 host.labels.addAll(labels);
>  646 for (Node node : host.nms.values()) {
>  647 replaceNodeForLabels(node.nodeId, node.labels, labels);
>  649 node.labels = null;
>  650 }
>  651 break;{code}
> The cause is in 647 line, when add labels to node without port, the 0 port 
> and the real nm port with be both add to node info, and when remove labels, 
> the parameter node.labels in 647 line is null, so it will not remove the old 
> label. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10501) Can't remove all node labels after add node label without nodemanager port

2021-03-16 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302335#comment-17302335
 ] 

Hadoop QA commented on YARN-10501:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
27s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} yetus {color} | {color:red}  0m 14s{color} 
| {color:red}{color} | {color:red} Unprocessed flag(s): 
--spotbugs-strict-precheck {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/798/artifact/out/Dockerfile
 |
| JIRA Issue | YARN-10501 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13022365/YARN-10502-branch-2.10.002.patch
 |
| Console output | 
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/798/console |
| versions | git=2.7.4 |
| Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |


This message was automatically generated.



> Can't remove all node labels after add node label without nodemanager port
> --
>
> Key: YARN-10501
> URL: https://issues.apache.org/jira/browse/YARN-10501
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.4.0
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Critical
> Fix For: 3.4.0, 3.3.1, 3.1.5, 3.2.3
>
> Attachments: YARN-10501-branch-2.10.001.patch, YARN-10501.002.patch, 
> YARN-10501.003.patch, YARN-10501.004.patch, YARN-10502-branch-2.10.002.patch
>
>
> When add a label to nodes without nodemanager port or use WILDCARD_PORT (0) 
> port, it can't remove all label info in these nodes
> Reproduce process:
> {code:java}
> 1.yarn rmadmin -addToClusterNodeLabels "cpunode(exclusive=true)"
> 2.yarn rmadmin -replaceLabelsOnNode "server001=cpunode"
> 3.curl http://RM_IP:8088/ws/v1/cluster/label-mappings
> {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":["server001:0","server001:45454"],"partitionInfo":{"resourceAvailable":{"memory":"510","vCores":"1","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"510"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"1"}]}}}
> 4.yarn rmadmin -replaceLabelsOnNode "server001"
> 5.curl http://RM_IP:8088/ws/v1/cluster/label-mappings
> {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":"server001:45454","partitionInfo":{"resourceAvailable":{"memory":"0","vCores":"0","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"0"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"0"}]}}}
>  {code}
> You can see after the 4 process to remove nodemanager labels, the label info 
> is still in the node info.
> {code:java}
>  641 case REPLACE:
>  642 replaceNodeForLabels(nodeId, host.labels, labels);
>  643 replaceLabelsForNode(nodeId, host.labels, labels);
>  644 host.labels.clear();
>  645 host.labels.addAll(labels);
>  646 for (Node node : host.nms.values()) {
>  647 replaceNodeForLabels(node.nodeId, node.labels, labels);
>  649 node.labels = null;
>  650 }
>  651 break;{code}
> The cause is in 647 line, when add labels to node without port, the 0 port 
> and the real nm port with be both add to node info, and when remove labels, 
> the parameter node.labels in 647 line is null, so it will not remove the old 
> label. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10501) Can't remove all node labels after add node label without nodemanager port

2021-03-16 Thread caozhiqiang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

caozhiqiang updated YARN-10501:
---
Attachment: YARN-10502-branch-2.10.002.patch

> Can't remove all node labels after add node label without nodemanager port
> --
>
> Key: YARN-10501
> URL: https://issues.apache.org/jira/browse/YARN-10501
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.4.0
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Critical
> Fix For: 3.4.0, 3.3.1, 3.1.5, 3.2.3
>
> Attachments: YARN-10501-branch-2.10.001.patch, YARN-10501.002.patch, 
> YARN-10501.003.patch, YARN-10501.004.patch, YARN-10502-branch-2.10.002.patch
>
>
> When add a label to nodes without nodemanager port or use WILDCARD_PORT (0) 
> port, it can't remove all label info in these nodes
> Reproduce process:
> {code:java}
> 1.yarn rmadmin -addToClusterNodeLabels "cpunode(exclusive=true)"
> 2.yarn rmadmin -replaceLabelsOnNode "server001=cpunode"
> 3.curl http://RM_IP:8088/ws/v1/cluster/label-mappings
> {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":["server001:0","server001:45454"],"partitionInfo":{"resourceAvailable":{"memory":"510","vCores":"1","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"510"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"1"}]}}}
> 4.yarn rmadmin -replaceLabelsOnNode "server001"
> 5.curl http://RM_IP:8088/ws/v1/cluster/label-mappings
> {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":"server001:45454","partitionInfo":{"resourceAvailable":{"memory":"0","vCores":"0","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"0"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"0"}]}}}
>  {code}
> You can see after the 4 process to remove nodemanager labels, the label info 
> is still in the node info.
> {code:java}
>  641 case REPLACE:
>  642 replaceNodeForLabels(nodeId, host.labels, labels);
>  643 replaceLabelsForNode(nodeId, host.labels, labels);
>  644 host.labels.clear();
>  645 host.labels.addAll(labels);
>  646 for (Node node : host.nms.values()) {
>  647 replaceNodeForLabels(node.nodeId, node.labels, labels);
>  649 node.labels = null;
>  650 }
>  651 break;{code}
> The cause is in 647 line, when add labels to node without port, the 0 port 
> and the real nm port with be both add to node info, and when remove labels, 
> the parameter node.labels in 647 line is null, so it will not remove the old 
> label. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10501) Can't remove all node labels after add node label without nodemanager port

2021-03-16 Thread caozhiqiang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

caozhiqiang updated YARN-10501:
---
Attachment: (was: YARN-10501-branch-2.10.1.002.patch)

> Can't remove all node labels after add node label without nodemanager port
> --
>
> Key: YARN-10501
> URL: https://issues.apache.org/jira/browse/YARN-10501
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.4.0
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Critical
> Fix For: 3.4.0, 3.3.1, 3.1.5, 3.2.3
>
> Attachments: YARN-10501-branch-2.10.001.patch, YARN-10501.002.patch, 
> YARN-10501.003.patch, YARN-10501.004.patch
>
>
> When add a label to nodes without nodemanager port or use WILDCARD_PORT (0) 
> port, it can't remove all label info in these nodes
> Reproduce process:
> {code:java}
> 1.yarn rmadmin -addToClusterNodeLabels "cpunode(exclusive=true)"
> 2.yarn rmadmin -replaceLabelsOnNode "server001=cpunode"
> 3.curl http://RM_IP:8088/ws/v1/cluster/label-mappings
> {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":["server001:0","server001:45454"],"partitionInfo":{"resourceAvailable":{"memory":"510","vCores":"1","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"510"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"1"}]}}}
> 4.yarn rmadmin -replaceLabelsOnNode "server001"
> 5.curl http://RM_IP:8088/ws/v1/cluster/label-mappings
> {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":"server001:45454","partitionInfo":{"resourceAvailable":{"memory":"0","vCores":"0","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"0"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"0"}]}}}
>  {code}
> You can see after the 4 process to remove nodemanager labels, the label info 
> is still in the node info.
> {code:java}
>  641 case REPLACE:
>  642 replaceNodeForLabels(nodeId, host.labels, labels);
>  643 replaceLabelsForNode(nodeId, host.labels, labels);
>  644 host.labels.clear();
>  645 host.labels.addAll(labels);
>  646 for (Node node : host.nms.values()) {
>  647 replaceNodeForLabels(node.nodeId, node.labels, labels);
>  649 node.labels = null;
>  650 }
>  651 break;{code}
> The cause is in 647 line, when add labels to node without port, the 0 port 
> and the real nm port with be both add to node info, and when remove labels, 
> the parameter node.labels in 647 line is null, so it will not remove the old 
> label. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10501) Can't remove all node labels after add node label without nodemanager port

2021-03-16 Thread caozhiqiang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

caozhiqiang updated YARN-10501:
---
Attachment: (was: YARN-10501-branch-2.10.1.001.patch)

> Can't remove all node labels after add node label without nodemanager port
> --
>
> Key: YARN-10501
> URL: https://issues.apache.org/jira/browse/YARN-10501
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.4.0
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Critical
> Fix For: 3.4.0, 3.3.1, 3.1.5, 3.2.3
>
> Attachments: YARN-10501-branch-2.10.001.patch, YARN-10501.002.patch, 
> YARN-10501.003.patch, YARN-10501.004.patch
>
>
> When add a label to nodes without nodemanager port or use WILDCARD_PORT (0) 
> port, it can't remove all label info in these nodes
> Reproduce process:
> {code:java}
> 1.yarn rmadmin -addToClusterNodeLabels "cpunode(exclusive=true)"
> 2.yarn rmadmin -replaceLabelsOnNode "server001=cpunode"
> 3.curl http://RM_IP:8088/ws/v1/cluster/label-mappings
> {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":["server001:0","server001:45454"],"partitionInfo":{"resourceAvailable":{"memory":"510","vCores":"1","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"510"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"1"}]}}}
> 4.yarn rmadmin -replaceLabelsOnNode "server001"
> 5.curl http://RM_IP:8088/ws/v1/cluster/label-mappings
> {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":"server001:45454","partitionInfo":{"resourceAvailable":{"memory":"0","vCores":"0","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"0"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"0"}]}}}
>  {code}
> You can see after the 4 process to remove nodemanager labels, the label info 
> is still in the node info.
> {code:java}
>  641 case REPLACE:
>  642 replaceNodeForLabels(nodeId, host.labels, labels);
>  643 replaceLabelsForNode(nodeId, host.labels, labels);
>  644 host.labels.clear();
>  645 host.labels.addAll(labels);
>  646 for (Node node : host.nms.values()) {
>  647 replaceNodeForLabels(node.nodeId, node.labels, labels);
>  649 node.labels = null;
>  650 }
>  651 break;{code}
> The cause is in 647 line, when add labels to node without port, the 0 port 
> and the real nm port with be both add to node info, and when remove labels, 
> the parameter node.labels in 647 line is null, so it will not remove the old 
> label. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9618) NodeListManager event improvement

2021-03-16 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-9618:
-
Parent Issue: YARN-10695  (was: YARN-9871)

> NodeListManager event improvement
> -
>
> Key: YARN-9618
> URL: https://issues.apache.org/jira/browse/YARN-9618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-9618.001.patch
>
>
> Current implementation nodelistmanager event blocks async dispacher and can 
> cause RM crash and slowing down event processing.
> # Cluster restart with 1K running apps . Each usable event will create 1K 
> events over all events could be 5k*1k events for 5K cluster
> # Event processing is blocked till new events are added to queue.
> Solution :
> # Add another async Event handler similar to scheduler.
> # Instead of adding events to dispatcher directly call RMApp event handler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9927) RM multi-thread event processing mechanism

2021-03-16 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-9927:
-
Parent: YARN-10695
Issue Type: Sub-task  (was: Improvement)

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Assignee: Qi Zhu
>Priority: Major
> Attachments: RM multi-thread event processing mechanism.pdf, 
> YARN-9927.001.patch
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10695) Event related improvement of YARN for better usage.

2021-03-16 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10695:
--
Description: 
This jira, marked the event related improvement in yarn for better usage.

 

> Event related improvement of YARN for better usage.
> ---
>
> Key: YARN-10695
> URL: https://issues.apache.org/jira/browse/YARN-10695
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>
> This jira, marked the event related improvement in yarn for better usage.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8995) Log events info in AsyncDispatcher when event queue size cumulatively reaches a certain number every time.

2021-03-16 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-8995:
-
Parent: YARN-10695
Issue Type: Sub-task  (was: Improvement)

> Log events info in AsyncDispatcher when event queue size cumulatively reaches 
> a certain number every time.
> --
>
> Key: YARN-8995
> URL: https://issues.apache.org/jira/browse/YARN-8995
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: metrics, nodemanager, resourcemanager
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: TestStreamPerf.java, 
> YARN-8995-branch-3.1.001.patch.addendum, YARN-8995.001.patch, 
> YARN-8995.002.patch, YARN-8995.003.patch, YARN-8995.004.patch, 
> YARN-8995.005.patch, YARN-8995.006.patch, YARN-8995.007.patch, 
> YARN-8995.008.patch, YARN-8995.009.patch, YARN-8995.010.patch, 
> YARN-8995.011.patch, YARN-8995.012.patch, YARN-8995.013.patch, 
> YARN-8995.014.patch, YARN-8995.015.patch, YARN-8995.016.patch, 
> image-2019-09-04-15-20-02-914.png
>
>
> In our growing cluster,there are unexpected situations that cause some event 
> queues to block the performance of the cluster, such as the bug of  
> https://issues.apache.org/jira/browse/YARN-5262 . I think it's necessary to 
> log the event type of the too big event queue size, and add the information 
> to the metrics, and the threshold of queue size is a parametor which can be 
> changed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9615) Add dispatcher metrics to RM

2021-03-16 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-9615:
-
Parent: YARN-10695
Issue Type: Sub-task  (was: Task)

> Add dispatcher metrics to RM
> 
>
> Key: YARN-9615
> URL: https://issues.apache.org/jira/browse/YARN-9615
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Hung
>Assignee: Qi Zhu
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-9615.001.patch, YARN-9615.002.patch, 
> YARN-9615.003.patch, YARN-9615.004.patch, YARN-9615.005.patch, 
> YARN-9615.006.patch, YARN-9615.007.patch, YARN-9615.008.patch, 
> YARN-9615.009.patch, YARN-9615.010.patch, YARN-9615.011.patch, 
> YARN-9615.011.patch, YARN-9615.poc.patch, image-2021-03-04-10-35-10-626.png, 
> image-2021-03-04-10-36-12-441.png, screenshot-1.png
>
>
> It'd be good to have counts/processing times for each event type in RM async 
> dispatcher and scheduler async dispatcher.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10695) Event related improvement of YARN for better usage.

2021-03-16 Thread Qi Zhu (Jira)
Qi Zhu created YARN-10695:
-

 Summary: Event related improvement of YARN for better usage.
 Key: YARN-10695
 URL: https://issues.apache.org/jira/browse/YARN-10695
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Qi Zhu
Assignee: Qi Zhu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10495) make the rpath of container-executor configurable

2021-03-16 Thread angerszhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302262#comment-17302262
 ] 

angerszhu commented on YARN-10495:
--

Hi [~ebadger]

When we build hadoop-3.3.0 meet glibc error 
{code:java}
writev(2, [{iov_base="/usr/share/hadoop-yarn/bin/conta"..., iov_len=45}, 
{iov_base=": ", iov_len=2}, {iov_base="/lib64/libc.so.6", iov_len=16}, 
{iov_base=": ", iov_len=2}, {iov_base="version `GLIBC_2.25' not found ("..., 
iov_len=75}, {iov_base="\n", iov_len=1}], 
6/usr/share/hadoop-yarn/bin/container-executor: /lib64/libc.so.6: version 
`GLIBC_2.25' not found (required by /lib64/x86_64/libcrypto.so.1.1)
{code}
can we also make glibc bundle to native path like `bundle.openssl` to solve 
this problem?

> make the rpath of container-executor configurable
> -
>
> Key: YARN-10495
> URL: https://issues.apache.org/jira/browse/YARN-10495
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.4.0, 3.3.1
>
> Attachments: YARN-10495.001.patch, YARN-10495.002.patch
>
>
> In  https://issues.apache.org/jira/browse/YARN-9561 we add dependency on 
> crypto to container-executor, we meet a case that in our jenkins machine, we 
> have libcrypto.so.1.0.0  in shared lib env. but in our nodemanager machine we 
> don't have  libcrypto.so.1.0.0  but *libcrypto.so.1.1.*
> We use a  internal custom dynamic link library environment 
> /usr/lib/x86_64-linux-gnu
> and we build hadoop with parameter as blow
> {code:java}
>  -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu
> {code}
>  
> Under jenkins machine shared lib library path /usr/lib/x86_64-linux-gun(where 
> is libcrypto)
> {code:java}
> -rw-r--r-- 1 root root   240136 Nov 28  2014 libcroco-0.6.so.3.0.1
> -rw-r--r-- 1 root root54550 Jun 18  2017 libcrypt.a
> -rw-r--r-- 1 root root  4306444 Sep 26  2019 libcrypto.a
> lrwxrwxrwx 1 root root   18 Sep 26  2019 libcrypto.so -> 
> libcrypto.so.1.0.0
> -rw-r--r-- 1 root root  2070976 Sep 26  2019 libcrypto.so.1.0.0
> lrwxrwxrwx 1 root root   35 Jun 18  2017 libcrypt.so -> 
> /lib/x86_64-linux-gnu/libcrypt.so.1
> -rw-r--r-- 1 root root  298 Jun 18  2017 libc.so
> {code}
>  
> Under nodemanager shared lib library path /usr/lib/x86_64-linux-gun(where is 
> libcrypto)
> {code:java}
> -rw-r--r--  1 root root55852 2��   7  2019 libcrypt.a
> -rw-r--r--  1 root root  4864244 9��  28  2019 libcrypto.a
> lrwxrwxrwx  1 root root   16 9��  28  2019 libcrypto.so -> 
> libcrypto.so.1.1
> -rw-r--r--  1 root root  2504576 12�� 24  2019 libcrypto.so.1.0.2
> -rw-r--r--  1 root root  2715840 9��  28  2019 libcrypto.so.1.1
> lrwxrwxrwx  1 root root   35 2��   7  2019 libcrypt.so -> 
> /lib/x86_64-linux-gnu/libcrypt.so.1
> -rw-r--r--  1 root root  298 2��   7  2019 libc.so
> {code}
>  We build container-executor with 
> The  libcrypto.so 's version is not same case error when we start nodemanager
>  
> {code:java}
> .. 3 more Caused by: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=127: /home/hadoop/hadoop/bin/container-executor: 
> error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared 
> object file: No such file or directory at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:182)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:208)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:306)
>  ... 4 more Caused by: ExitCodeException exitCode=127: 
> /home/hadoop/hadoop/bin/container-executor: error while loading shared 
> libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file 
> or directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008) at 
> org.apache.hadoop.util.Shell.run(Shell.java:901) at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:154)
>  ... 6 more 
> {code}
>  
> We should make RPATH of container-executor configurable to solve this problem 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org