[jira] [Commented] (YARN-7913) Improve error handling when application recovery fails with exception

2018-05-21 Thread Oleksandr Shevchenko (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483563#comment-16483563
 ] 

Oleksandr Shevchenko commented on YARN-7913:


Thank you [~wilfreds] for your comment. I created separated JIRA ticket 
YARN-7998 to solve the problem related to ACLs configurations changes.

> Improve error handling when application recovery fails with exception
> -
>
> Key: YARN-7913
> URL: https://issues.apache.org/jira/browse/YARN-7913
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.0.0
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Attachments: YARN-7913.000.poc.patch
>
>
> There are edge cases when the application recovery fails with an exception.
> Example failure scenario:
>  * setup: a queue is a leaf queue in the primary RM's config and the same 
> queue is a parent queue in the secondary RM's config.
>  * When failover happens with this setup, the recovery will fail for 
> applications on this queue, and an APP_REJECTED event will be dispatched to 
> the async dispatcher. On the same thread (that handles the recovery), a 
> NullPointerException is thrown when the applicationAttempt is tried to be 
> recovered 
> (https://github.com/apache/hadoop/blob/55066cc53dc22b68f9ca55a0029741d6c846be0a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L494).
>  I don't see a good way to avoid the NPE in this scenario, because when the 
> NPE occurs the APP_REJECTED has not been processed yet, and we don't know 
> that the application recovery failed.
> Currently the first exception will abort the recovery, and if there are X 
> applications, there will be ~X passive -> active RM transition attempts - the 
> passive -> active RM transition will only succeed when the last APP_REJECTED 
> event is processed on the async dispatcher thread.
> _The point of this ticket is to improve the error handling and reduce the 
> number of passive -> active RM transition attempts (solving the above 
> described failure scenario isn't in scope)._



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8248) Job hangs when a job requests a resource that its queue does not have

2018-05-21 Thread Szilard Nemeth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483546#comment-16483546
 ] 

Szilard Nemeth commented on YARN-8248:
--

Thanks [~haibochen] for the reviews!

> Job hangs when a job requests a resource that its queue does not have
> -
>
> Key: YARN-8248
> URL: https://issues.apache.org/jira/browse/YARN-8248
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8248-001.patch, YARN-8248-002.patch, 
> YARN-8248-003.patch, YARN-8248-004.patch, YARN-8248-005.patch, 
> YARN-8248-006.patch, YARN-8248-007.patch, YARN-8248-008.patch, 
> YARN-8248-009.patch, YARN-8248-010.patch, YARN-8248-011.patch, 
> YARN-8248-012.patch, YARN-8248-013.patch, YARN-8248-014.patch
>
>
> Job hangs when mapreduce.job.queuename is specified and the queue has 0 of 
> any resource (vcores / memory / other)
> In this scenario, the job should be immediately rejected upon submission 
> since the specified queue cannot serve the resource needs of the submitted 
> job.
>  
> Command to run:
> {code:java}
> bin/yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" 
> pi -Dmapreduce.job.queuename=sample_queue 1 1000;{code}
> fair-scheduler.xml queue config (excerpt):
>  
> {code:java}
>  
> 1 mb,0vcores
> 9 mb,0vcores
> 50
> -1.0f
> 2.0
> fair
>   
> {code}
> Diagnostic message from the web UI: 
> {code:java}
> Wed May 02 06:35:57 -0700 2018] Application is added to the scheduler and is 
> not yet activated. (Resource request:  exceeds current 
> queue or its parents maximum resource allowed).{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4677) RMNodeResourceUpdateEvent update from scheduler can lead to race condition

2018-05-21 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483530#comment-16483530
 ] 

Wilfred Spiegelenburg commented on YARN-4677:
-

This is the exception thrown when the issue happens
{code:java}
FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type NODE_UPDATE to the scheduler
java.lang.NullPointerException
  at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:892)
  at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1089)
  at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:122)
  at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:709)
  at java.lang.Thread.run(Thread.java:748)
{code}
The NPE is not caught and that triggers the uncaught exception handler which 
then triggers the exit. The code is a custom code base patched with a number of 
things, this is the code snippet that reflects the specific codebase:
{code:java}
891 if (nm.getState() == NodeState.DECOMMISSIONING) {
892   this.rmContext
893 .getDispatcher()
894 .getEventHandler()
895 .handle(
896new RMNodeResourceUpdateEvent(nm.getNodeID(), ResourceOption
897 .newInstance(getSchedulerNode(nm.getNodeID())
898 .getUsedResource(), 0)));
899 }
{code}

> RMNodeResourceUpdateEvent update from scheduler can lead to race condition
> --
>
> Key: YARN-4677
> URL: https://issues.apache.org/jira/browse/YARN-4677
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, resourcemanager, scheduler
>Affects Versions: 2.7.1
>Reporter: Brook Zhou
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-4677-branch-2.001.patch, 
> YARN-4677-branch-2.002.patch, YARN-4677.01.patch
>
>
> When a node is in decommissioning state, there is time window between 
> completedContainer() and RMNodeResourceUpdateEvent get handled in 
> scheduler.nodeUpdate (YARN-3223). 
> So if a scheduling effort happens within this window, the new container could 
> still get allocated on this node. Even worse case is if scheduling effort 
> happen after RMNodeResourceUpdateEvent sent out but before it is propagated 
> to SchedulerNode - then the total resource is lower than used resource and 
> available resource is a negative value. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8320) Add support CPU isolation for latency-sensitive (LS) service

2018-05-21 Thread Jiandan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiandan Yang  updated YARN-8320:

Attachment: CPU-isolation-for-latency-sensitive-services-v1.pdf

> Add support CPU isolation for latency-sensitive  (LS) service
> -
>
> Key: YARN-8320
> URL: https://issues.apache.org/jira/browse/YARN-8320
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, 
> YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares” to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
>  * Request latency of services running on container may be frequent shake 
> when all containers share cpus, and latency-sensitive services can not afford 
> in our production environment.
> So we need more finer cpu isolation.
> My co-workers and I propose a solution using cgroup cpuset to binds 
> containers to different processors, this is inspired by the isolation 
> technique in [Borg 
> system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].
>  Later I will upload a detailed design doc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8320) Add support CPU isolation for latency-sensitive (LS) service

2018-05-21 Thread Jiandan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiandan Yang  updated YARN-8320:

Attachment: (was: CPU-isolation-for-latency-sensitive-services-v1.pdf)

> Add support CPU isolation for latency-sensitive  (LS) service
> -
>
> Key: YARN-8320
> URL: https://issues.apache.org/jira/browse/YARN-8320
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, 
> YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares” to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
>  * Request latency of services running on container may be frequent shake 
> when all containers share cpus, and latency-sensitive services can not afford 
> in our production environment.
> So we need more finer cpu isolation.
> My co-workers and I propose a solution using cgroup cpuset to binds 
> containers to different processors, this is inspired by the isolation 
> technique in [Borg 
> system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].
>  Later I will upload a detailed design doc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7913) Improve error handling when application recovery fails with exception

2018-05-21 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483490#comment-16483490
 ] 

Wilfred Spiegelenburg commented on YARN-7913:
-

Just ran into this issue and stumbled into this jira. First point I do not see 
how continuing on really solves the issue. We are recovering running 
applications when this issue occurs. If we have failed or finished applications 
we shortcut and process things differently. I fixed restoring failed finished 
application in YARN-7139 in the proper queues.

Do we really want to fail recovering a running application when we have a 
scheduler configuration that is out of sync? We do not allow the removal of a 
queue that is not empty and also handle other configuration changes graceful. 
That is all covered in YARN-8191. To now say we just dump a running application 
and forget about it is not the correct thing. I think we should handle this 
gracefully and properly restore the running application.

As described by [~oshevchenko] we should take into account the ACLs and queues 
etc. YARN-2308 for the capacity scheduler handles it by stating that you MUST 
have all the queues from the previous state in the system. We can do something 
like that or more dynamic. I worked with Yufei on a possible solution a while 
back around this that fits in with the dynamic way we work with rules etc.

I am happy to add a poc for this to the jira.



> Improve error handling when application recovery fails with exception
> -
>
> Key: YARN-7913
> URL: https://issues.apache.org/jira/browse/YARN-7913
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.0.0
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Attachments: YARN-7913.000.poc.patch
>
>
> There are edge cases when the application recovery fails with an exception.
> Example failure scenario:
>  * setup: a queue is a leaf queue in the primary RM's config and the same 
> queue is a parent queue in the secondary RM's config.
>  * When failover happens with this setup, the recovery will fail for 
> applications on this queue, and an APP_REJECTED event will be dispatched to 
> the async dispatcher. On the same thread (that handles the recovery), a 
> NullPointerException is thrown when the applicationAttempt is tried to be 
> recovered 
> (https://github.com/apache/hadoop/blob/55066cc53dc22b68f9ca55a0029741d6c846be0a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L494).
>  I don't see a good way to avoid the NPE in this scenario, because when the 
> NPE occurs the APP_REJECTED has not been processed yet, and we don't know 
> that the application recovery failed.
> Currently the first exception will abort the recovery, and if there are X 
> applications, there will be ~X passive -> active RM transition attempts - the 
> passive -> active RM transition will only succeed when the last APP_REJECTED 
> event is processed on the async dispatcher thread.
> _The point of this ticket is to improve the error handling and reduce the 
> number of passive -> active RM transition attempts (solving the above 
> described failure scenario isn't in scope)._



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8108) RM metrics rest API throws GSSException in kerberized environment

2018-05-21 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483464#comment-16483464
 ] 

Yongjun Zhang edited comment on YARN-8108 at 5/22/18 4:15 AM:
--

Hi [~eyang] and [~kbadani],

Would you please confirm that this jira is a blocker for 3.0.3 release? If so, 
can we expedite the work here since we are in the process of preparing 3.0.3 
release?

Thanks a lot.




was (Author: yzhangal):
Hi Guys,

Would you please confirm that this jira is a blocker for 3.0.3 release? If so, 
can we expedite the work here since we are in the process of preparing 3.0.3 
release?

Thanks a lot.



> RM metrics rest API throws GSSException in kerberized environment
> -
>
> Key: YARN-8108
> URL: https://issues.apache.org/jira/browse/YARN-8108
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Kshitij Badani
>Assignee: Eric Yang
>Priority: Blocker
> Attachments: YARN-8108.001.patch
>
>
> Test is trying to pull up metrics data from SHS after kiniting as 'test_user'
> It is throwing GSSException as follows
> {code:java}
> b2b460b80713|RUNNING: curl --silent -k -X GET -D 
> /hwqe/hadoopqe/artifacts/tmp-94845 --negotiate -u : 
> http://rm_host:8088/proxy/application_1518674952153_0070/metrics/json2018-02-15
>  07:15:48,757|INFO|MainThread|machine.py:194 - 
> run()||GUID=fc5a3266-28f8-4eed-bae2-b2b460b80713|Exit Code: 0
> 2018-02-15 07:15:48,758|INFO|MainThread|spark.py:1757 - 
> getMetricsJsonData()|metrics:
> 
> 
> 
> Error 403 GSSException: Failure unspecified at GSS-API level 
> (Mechanism level: Request is a replay (34))
> 
> HTTP ERROR 403
> Problem accessing /proxy/application_1518674952153_0070/metrics/json. 
> Reason:
>  GSSException: Failure unspecified at GSS-API level (Mechanism level: 
> Request is a replay (34))
> 
> 
> {code}
> Rootcausing : proxyserver on RM can't be supported for Kerberos enabled 
> cluster because AuthenticationFilter is applied twice in Hadoop code (once in 
> httpServer2 for RM, and another instance from AmFilterInitializer for proxy 
> server). This will require code changes to hadoop-yarn-server-web-proxy 
> project



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8108) RM metrics rest API throws GSSException in kerberized environment

2018-05-21 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483464#comment-16483464
 ] 

Yongjun Zhang commented on YARN-8108:
-

Hi Guys,

Would you please confirm that this jira is a blocker for 3.0.3 release? If so, 
can we expedite the work here since we are in the process of preparing 3.0.3 
release?

Thanks a lot.



> RM metrics rest API throws GSSException in kerberized environment
> -
>
> Key: YARN-8108
> URL: https://issues.apache.org/jira/browse/YARN-8108
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Kshitij Badani
>Assignee: Eric Yang
>Priority: Blocker
> Attachments: YARN-8108.001.patch
>
>
> Test is trying to pull up metrics data from SHS after kiniting as 'test_user'
> It is throwing GSSException as follows
> {code:java}
> b2b460b80713|RUNNING: curl --silent -k -X GET -D 
> /hwqe/hadoopqe/artifacts/tmp-94845 --negotiate -u : 
> http://rm_host:8088/proxy/application_1518674952153_0070/metrics/json2018-02-15
>  07:15:48,757|INFO|MainThread|machine.py:194 - 
> run()||GUID=fc5a3266-28f8-4eed-bae2-b2b460b80713|Exit Code: 0
> 2018-02-15 07:15:48,758|INFO|MainThread|spark.py:1757 - 
> getMetricsJsonData()|metrics:
> 
> 
> 
> Error 403 GSSException: Failure unspecified at GSS-API level 
> (Mechanism level: Request is a replay (34))
> 
> HTTP ERROR 403
> Problem accessing /proxy/application_1518674952153_0070/metrics/json. 
> Reason:
>  GSSException: Failure unspecified at GSS-API level (Mechanism level: 
> Request is a replay (34))
> 
> 
> {code}
> Rootcausing : proxyserver on RM can't be supported for Kerberos enabled 
> cluster because AuthenticationFilter is applied twice in Hadoop code (once in 
> httpServer2 for RM, and another instance from AmFilterInitializer for proxy 
> server). This will require code changes to hadoop-yarn-server-web-proxy 
> project



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-21 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483402#comment-16483402
 ] 

genericqa commented on YARN-8292:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
37s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 15s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
24s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
42s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 23s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 12 new + 97 unchanged - 0 fixed = 109 total (was 97) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
38s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 29s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
21s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 69m  4s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}158m 10s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.monitor.capacity.TestPreemptionForQueueWithPriorities
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8292 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12924438/YARN-8292.006.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 7acd2ecf598c 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess

[jira] [Commented] (YARN-8327) Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows

2018-05-21 Thread Anbang Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483394#comment-16483394
 ] 

Anbang Hu commented on YARN-8327:
-

+1 on [^YARN-8327.v2.patch].

> Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows
> --
>
> Key: YARN-8327
> URL: https://issues.apache.org/jira/browse/YARN-8327
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8327.v1.patch, YARN-8327.v2.patch, 
> image-2018-05-18-16-52-08-250.png, image-2018-05-21-09-05-49-550.png
>
>
> TestAggregatedLogFormat#testReadAcontainerLogs1 fails on Windows because of 
> the line separator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8336) Fix potential connection leak in SchedConfCLI and YarnWebServiceUtils

2018-05-21 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483206#comment-16483206
 ] 

genericqa commented on YARN-8336:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 44s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 48s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
27s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
48s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
14s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 28m 
25s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}106m 28s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common |
|  |  Nullcheck of webServiceClient at line 59 of value previously dereferenced 
in 
org.apache.hadoop.yarn.webapp.util.YarnWebServiceUtils.getNodeInfoFromRMWebService(Configuration,
 String)  At YarnWebServiceUtils.java:59 of value previously dereferenced in 
org.apache.hadoop.yarn.webapp.util.YarnWebServiceUtils.getNodeInfoFromRMWebService(Con

[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-21 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483185#comment-16483185
 ] 

Wangda Tan commented on YARN-8292:
--

Rebased to latest trunk (006)

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8292.001.patch, YARN-8292.002.patch, 
> YARN-8292.003.patch, YARN-8292.004.patch, YARN-8292.005.patch, 
> YARN-8292.006.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-21 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8292:
-
Attachment: YARN-8292.006.patch

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8292.001.patch, YARN-8292.002.patch, 
> YARN-8292.003.patch, YARN-8292.004.patch, YARN-8292.005.patch, 
> YARN-8292.006.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8334) Fix potential connection leak in GPGUtils

2018-05-21 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483182#comment-16483182
 ] 

genericqa commented on YARN-8334:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 27m 
50s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} YARN-7402 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
24s{color} | {color:green} YARN-7402 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
17s{color} | {color:green} YARN-7402 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} YARN-7402 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
21s{color} | {color:green} YARN-7402 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 33s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
27s{color} | {color:green} YARN-7402 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} YARN-7402 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 33s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
36s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-globalpolicygenerator
 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
29s{color} | {color:green} hadoop-yarn-server-globalpolicygenerator in the 
patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 75m  3s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-globalpolicygenerator
 |
|  |  Nullcheck of client at line 56 of value previously dereferenced in 
org.apache.hadoop.yarn.server.globalpolicygenerator.GPGUtils.invokeRMWebService(Configuration,
 String, String, Class)  At GPGUtils.java:56 of value previously dereferenced 
in 
org.apache.hadoop.yarn.server.globalpolicygenerator.GPGUtils.invokeRMWebService(Configuration,
 String, String, Class)  At GPGUtils.java:[line 56] |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-8334 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12924405/YARN-8334-YARN-7402.v1.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux d40cad36d959 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 
21:23:04 UTC 2018 x86_

[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-21 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483181#comment-16483181
 ] 

Wangda Tan commented on YARN-8292:
--

Thanks [~eepayne], 

I just checked both, 

For the infra queue preemption behavior:
bq. For example, if gpu is the extended resource, but no apps are currently 
using gpu in the queue, no intra-queue preemption will take place.
I think you're correct, the change I propose is:
{code} 
  if (conservativeDRF) {
// When we want to do less aggressive preemption, we don't want to
// preempt from any resource type if after preemption it becomes 0 or
// negative.
// For example:
// - to-obtain = <30, 20, 0>, container <20, 20, 0> => allowed
// - to-obtain = <30, 20, 0>, container <10, 10, 1> => disallowed
// - to-obtain = <20, 30, 1>, container <20, 30, 1> => allowed
// - to-obtain = <10, 20, 1>, container <11, 11, 0> = disallowed.
doPreempt = Resources.lessThan(rc, clusterResource,
Resources
.componentwiseMin(toObtainAfterPreemption, Resources.none()),
Resources.componentwiseMin(toObtainByPartition, Resources.none()));
{code}
However, this causes many (more than 20) infra queue preemption test cases 
failure. Since the logic (ver.005 patch) is not a regression. Can we address 
this in a separate JIRA if we cannot come with some simple solution? 

For:
bq. I don't think this is necessary. ..
Actually this is required after the change.
TLDR;
We now deduct unassigned (005) while doing calculation, but the previous logic 
doesn't. 
The previous logic deduct it after each iteration:
{code}
Resources.addTo(wQassigned, wQdone);
  }
  Resources.subtractFrom(unassigned, wQassigned);
{code}

In our logic, we need the up-to-date unassigned to cap the {{wQavail}}, so we 
deduct it with the calculation.
{code}
// Make sure it is not beyond unassigned
wQavail = Resources.componentwiseMin(wQavail, unassigned);
{code}

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8292.001.patch, YARN-8292.002.patch, 
> YARN-8292.003.patch, YARN-8292.004.patch, YARN-8292.005.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8327) Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows

2018-05-21 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483164#comment-16483164
 ] 

genericqa commented on YARN-8327:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
28s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 49s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 41s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
13s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 57m  2s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8327 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12924426/YARN-8327.v2.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 380119518e8b 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 0b4c44b |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20813/testReport/ |
| Max. process+thread count | 407 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20813/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows
> --

[jira] [Commented] (YARN-8334) Fix potential connection leak in GPGUtils

2018-05-21 Thread Botong Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483159#comment-16483159
 ] 

Botong Huang commented on YARN-8334:


Thanks [~giovanni.fumarola] for the patch. Calling destroy() explicitly seems 
weird. Is it possible to avoid that? Otherwise LGTM. 

> Fix potential connection leak in GPGUtils
> -
>
> Key: YARN-8334
> URL: https://issues.apache.org/jira/browse/YARN-8334
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Minor
> Attachments: YARN-8334-YARN-7402.v1.patch
>
>
> Missing ClientResponse.close and Client.destroy can lead to a connection leak.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8329) Docker client configuration can still be set incorrectly

2018-05-21 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483140#comment-16483140
 ] 

Jason Lowe commented on YARN-8329:
--

Thanks for the patch!

I'm wondering why the code is filtering the tokens by kind into a separate 
credentials only to pass that to another method which will also filter the 
tokens by the same kind.  Can't we just iterate the original credentials 
directly, ignoring tokens that aren't the appropriate kind?  I'm not seeing why 
the copy is necessary.  Eliminating the copy would also eliminate the need to 
do a token identifier decode to construct an alias.


> Docker client configuration can still be set incorrectly
> 
>
> Key: YARN-8329
> URL: https://issues.apache.org/jira/browse/YARN-8329
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8329.001.patch
>
>
> YARN-7996 implemented a fix to avoid writing an empty Docker client 
> configuration file, but there are still cases where the {{docker --config}} 
> argument is set in error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7960) Add no-new-privileges flag to docker run

2018-05-21 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483138#comment-16483138
 ] 

Eric Yang commented on YARN-7960:
-

+1 looks good to me.

> Add no-new-privileges flag to docker run
> 
>
> Key: YARN-7960
> URL: https://issues.apache.org/jira/browse/YARN-7960
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
> Attachments: YARN-7960.001.patch, YARN-7960.002.patch
>
>
> Minimally, this should be used for unprivileged containers. It's a cheap way 
> to add an extra layer of security to the docker model. For privileged 
> containers, it might be appropriate to omit this flag
> https://github.com/moby/moby/pull/20727



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs

2018-05-21 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483119#comment-16483119
 ] 

genericqa commented on YARN-8041:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
47s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  5s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
54s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 56s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: The patch generated 54 new 
+ 18 unchanged - 0 fixed = 72 total (was 18) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 48s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m  
7s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 68m  
0s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
21s{color} | {color:green} hadoop-yarn-server-router in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}144m 45s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8041 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12924401/YARN-8041.004.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 75b548cc4895 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build t

[jira] [Commented] (YARN-8326) Yarn 3.0 seems runs slower than Yarn 2.6

2018-05-21 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483116#comment-16483116
 ] 

Eric Yang commented on YARN-8326:
-

[~hlhu...@us.ibm.com] Does the same log entries show up?

> Yarn 3.0 seems runs slower than Yarn 2.6
> 
>
> Key: YARN-8326
> URL: https://issues.apache.org/jira/browse/YARN-8326
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0
> Environment: This is the yarn-site.xml for 3.0. 
>  
> 
> 
>  hadoop.registry.dns.bind-port
>  5353
>  
> 
>  hadoop.registry.dns.domain-name
>  hwx.site
>  
> 
>  hadoop.registry.dns.enabled
>  true
>  
> 
>  hadoop.registry.dns.zone-mask
>  255.255.255.0
>  
> 
>  hadoop.registry.dns.zone-subnet
>  172.17.0.0
>  
> 
>  manage.include.files
>  false
>  
> 
>  yarn.acl.enable
>  false
>  
> 
>  yarn.admin.acl
>  yarn
>  
> 
>  yarn.client.nodemanager-connect.max-wait-ms
>  6
>  
> 
>  yarn.client.nodemanager-connect.retry-interval-ms
>  1
>  
> 
>  yarn.http.policy
>  HTTP_ONLY
>  
> 
>  yarn.log-aggregation-enable
>  false
>  
> 
>  yarn.log-aggregation.retain-seconds
>  2592000
>  
> 
>  yarn.log.server.url
>  
> [http://xx:19888/jobhistory/logs|http://whiny2.fyre.ibm.com:19888/jobhistory/logs]
>  
> 
>  yarn.log.server.web-service.url
>  
> [http://xx:8188/ws/v1/applicationhistory|http://whiny2.fyre.ibm.com:8188/ws/v1/applicationhistory]
>  
> 
>  yarn.node-labels.enabled
>  false
>  
> 
>  yarn.node-labels.fs-store.retry-policy-spec
>  2000, 500
>  
> 
>  yarn.node-labels.fs-store.root-dir
>  /system/yarn/node-labels
>  
> 
>  yarn.nodemanager.address
>  0.0.0.0:45454
>  
> 
>  yarn.nodemanager.admin-env
>  MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX
>  
> 
>  yarn.nodemanager.aux-services
>  mapreduce_shuffle,spark2_shuffle,timeline_collector
>  
> 
>  yarn.nodemanager.aux-services.mapreduce_shuffle.class
>  org.apache.hadoop.mapred.ShuffleHandler
>  
> 
>  yarn.nodemanager.aux-services.spark2_shuffle.class
>  org.apache.spark.network.yarn.YarnShuffleService
>  
> 
>  yarn.nodemanager.aux-services.spark2_shuffle.classpath
>  /usr/spark2/aux/*
>  
> 
>  yarn.nodemanager.aux-services.spark_shuffle.class
>  org.apache.spark.network.yarn.YarnShuffleService
>  
> 
>  yarn.nodemanager.aux-services.timeline_collector.class
>  
> org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService
>  
> 
>  yarn.nodemanager.bind-host
>  0.0.0.0
>  
> 
>  yarn.nodemanager.container-executor.class
>  
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor
>  
> 
>  yarn.nodemanager.container-metrics.unregister-delay-ms
>  6
>  
> 
>  yarn.nodemanager.container-monitor.interval-ms
>  3000
>  
> 
>  yarn.nodemanager.delete.debug-delay-sec
>  0
>  
> 
>  
> yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage
>  90
>  
> 
>  yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb
>  1000
>  
> 
>  yarn.nodemanager.disk-health-checker.min-healthy-disks
>  0.25
>  
> 
>  yarn.nodemanager.health-checker.interval-ms
>  135000
>  
> 
>  yarn.nodemanager.health-checker.script.timeout-ms
>  6
>  
> 
>  
> yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage
>  false
>  
> 
>  yarn.nodemanager.linux-container-executor.group
>  hadoop
>  
> 
>  
> yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users
>  false
>  
> 
>  yarn.nodemanager.local-dirs
>  /hadoop/yarn/local
>  
> 
>  yarn.nodemanager.log-aggregation.compression-type
>  gz
>  
> 
>  yarn.nodemanager.log-aggregation.debug-enabled
>  false
>  
> 
>  yarn.nodemanager.log-aggregation.num-log-files-per-app
>  30
>  
> 
>  
> yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds
>  3600
>  
> 
>  yarn.nodemanager.log-dirs
>  /hadoop/yarn/log
>  
> 
>  yarn.nodemanager.log.retain-seconds
>  604800
>  
> 
>  yarn.nodemanager.pmem-check-enabled
>  false
>  
> 
>  yarn.nodemanager.recovery.dir
>  /var/log/hadoop-yarn/nodemanager/recovery-state
>  
> 
>  yarn.nodemanager.recovery.enabled
>  true
>  
> 
>  yarn.nodemanager.recovery.supervised
>  true
>  
> 
>  yarn.nodemanager.remote-app-log-dir
>  /app-logs
>  
> 
>  yarn.nodemanager.remote-app-log-dir-suffix
>  logs
>  
> 
>  yarn.nodemanager.resource-plugins
>  
>  
> 
>  yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices
>  auto
>  
> 
>  yarn.nodemanager.resource-plugins.gpu.docker-plugin
>  nvidia-docker-v1
>  
> 
>  yarn.nodemanager.resource-plugins.gpu.docker-plugin.nvidiadocker-
>  v1.endpoint
>  [http://localhost:3476/v1.0/docker/cli]
>  
> 
>  
> yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables
>  
>  
> 
>  yarn.nodemanager.resource.cpu-vcores
>  6
>  
> 
>  yarn.nodemanager.resource.memory-mb
>  12288
>  
> 
>  yarn.nodemanager.resource.percentage-ph

[jira] [Commented] (YARN-8327) Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows

2018-05-21 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483099#comment-16483099
 ] 

Giovanni Matteo Fumarola commented on YARN-8327:


[~huanbang1993] thanks for the review. Attached v2.

> Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows
> --
>
> Key: YARN-8327
> URL: https://issues.apache.org/jira/browse/YARN-8327
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8327.v1.patch, YARN-8327.v2.patch, 
> image-2018-05-18-16-52-08-250.png, image-2018-05-21-09-05-49-550.png
>
>
> TestAggregatedLogFormat#testReadAcontainerLogs1 fails on Windows because of 
> the line separator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8327) Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows

2018-05-21 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8327:
---
Attachment: YARN-8327.v2.patch

> Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows
> --
>
> Key: YARN-8327
> URL: https://issues.apache.org/jira/browse/YARN-8327
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8327.v1.patch, YARN-8327.v2.patch, 
> image-2018-05-18-16-52-08-250.png, image-2018-05-21-09-05-49-550.png
>
>
> TestAggregatedLogFormat#testReadAcontainerLogs1 fails on Windows because of 
> the line separator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8173) [Router] Implement missing FederationClientInterceptor#getApplications()

2018-05-21 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8173:
---
Attachment: YARN-8327.v2.patch

> [Router] Implement missing FederationClientInterceptor#getApplications()
> 
>
> Key: YARN-8173
> URL: https://issues.apache.org/jira/browse/YARN-8173
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Yiran Wu
>Assignee: Yiran Wu
>Priority: Major
> Attachments: YARN-8173.001.patch, YARN-8173.002.patch, 
> YARN-8173.003.patch
>
>
> oozie dependent method Implement
> {code:java}
> getApplications()
> getDeglationToken()
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8173) [Router] Implement missing FederationClientInterceptor#getApplications()

2018-05-21 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8173:
---
Attachment: (was: YARN-8327.v2.patch)

> [Router] Implement missing FederationClientInterceptor#getApplications()
> 
>
> Key: YARN-8173
> URL: https://issues.apache.org/jira/browse/YARN-8173
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Yiran Wu
>Assignee: Yiran Wu
>Priority: Major
> Attachments: YARN-8173.001.patch, YARN-8173.002.patch, 
> YARN-8173.003.patch
>
>
> oozie dependent method Implement
> {code:java}
> getApplications()
> getDeglationToken()
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers

2018-05-21 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483087#comment-16483087
 ] 

Jason Lowe commented on YARN-8206:
--

Thanks for updating the patch!  +1 lgtm.  I'll commit this tomorrow if there 
are no objections.


> Sending a kill does not immediately kill docker containers
> --
>
> Key: YARN-8206
> URL: https://issues.apache.org/jira/browse/YARN-8206
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8206.001.patch, YARN-8206.002.patch, 
> YARN-8206.003.patch, YARN-8206.004.patch, YARN-8206.005.patch, 
> YARN-8206.006.patch, YARN-8206.007.patch, YARN-8206.008.patch, 
> YARN-8206.009.patch, YARN-8206.010.patch, YARN-8206.011.patch
>
>
> {noformat}
> if (ContainerExecutor.Signal.KILL.equals(signal)
> || ContainerExecutor.Signal.TERM.equals(signal)) {
>   handleContainerStop(containerId, env);
> {noformat}
> Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent 
> for docker containers. However, they should actually be separate. When YARN 
> sends a SIGKILL to a process, it means for it to die immediately and not sit 
> around waiting for anything. This ensures an immediate reclamation of 
> resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task 
> might not handle the signal correctly, and will then end up as a failed task 
> instead of a killed task. This is especially bad for preemption. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-05-21 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483057#comment-16483057
 ] 

Eric Yang commented on YARN-8259:
-

System administrator can reserve one cpu core for node manager and all the 
docker inspect call are counted toward saturating one cpu core, but not more.  
Exact accounting is not available today, but I usually recommend customers to 
do this to avoid system overload.

At a glance of yarn code base, I only found one instance of code that is 
reading /proc/[pid]/ from node manager.  This is located in 
CGroupsResourceCalculator.java.  Hence, hidepid is not working by 
implementation.  This can be addressed in other JIRAs to make this proper.  I 
am +0 on this patch.

> Revisit liveliness checks for Docker containers
> ---
>
> Key: YARN-8259
> URL: https://issues.apache.org/jira/browse/YARN-8259
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8259.001.patch
>
>
> As privileged containers may execute as a user that does not match the YARN 
> run as user, sending the null signal for liveliness checks could fail. We 
> need to reconsider how liveliness checks are handled in the Docker case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8179) Preemption does not happen due to natural_termination_factor when DRF is used

2018-05-21 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483046#comment-16483046
 ] 

Eric Payne commented on YARN-8179:
--

Committed to trunk, branch-3.1, and branch-3.0. Thanks for the good work, 
[~kyungwan nam]

> Preemption does not happen due to natural_termination_factor when DRF is used
> -
>
> Key: YARN-8179
> URL: https://issues.apache.org/jira/browse/YARN-8179
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: kyungwan nam
>Assignee: kyungwan nam
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 3.0.3
>
> Attachments: YARN-8179.001.patch, YARN-8179.002.patch, 
> YARN-8179.003.patch
>
>
> cluster
> * DominantResourceCalculator
> * QueueA : 50 (capacity) ~ 100 (max capacity)
> * QueueB : 50 (capacity) ~ 50 (max capacity)
> all of resources have been allocated to QueueA. (all Vcores are allocated to 
> QueueA)
> if App1 is submitted to QueueB, over-utilized QueueA should be preempted.
> but, I’ve met the problem, which preemption does not happen. it caused that 
> App1 AM can not allocated.
> when App1 is submitted, pending resources for asking App1 AM would be 
> 
> so, Vcores which need to be preempted from QueueB should be 1.
> but, it can be 0 due to natural_termination_factor (default is 0.2)
> we should guarantee that resources not to be 0 even though applying 
> natural_termination_factor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-05-21 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483041#comment-16483041
 ] 

Eric Badger commented on YARN-8259:
---

Also, I have tested the current patch for correctness. So, if we decide to go 
with the current implementation, I am +1 on the patch. 

> Revisit liveliness checks for Docker containers
> ---
>
> Key: YARN-8259
> URL: https://issues.apache.org/jira/browse/YARN-8259
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8259.001.patch
>
>
> As privileged containers may execute as a user that does not match the YARN 
> run as user, sending the null signal for liveliness checks could fail. We 
> need to reconsider how liveliness checks are handled in the Docker case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-05-21 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483038#comment-16483038
 ] 

Eric Badger commented on YARN-8259:
---

bq. If hidepid option is used by system administrator, yarn user might not have 
rights to check if /proc/[pid] exists.
This might be a concern, but there is a workaround to allow for the admin to 
whitelist the NM user
https://linux-audit.com/linux-system-hardening-adding-hidepid-to-proc/

bq. Also, the reacquistion code runs signalContainer once per second until the 
application finishes, this resulted in many docker inspect and 
container-executor calls, which are expensive operations.
This worries me the most. Especially on nodes where there are lots of 
containers running concurrently, this could be pretty devastating for rolling 
upgrades.

I'm not sure I have a strong opinion one way or another on retries vs. /proc 
for correctness, but I am worried about overloading the docker daemon with a 
large amount of inspect/ps calls. 

> Revisit liveliness checks for Docker containers
> ---
>
> Key: YARN-8259
> URL: https://issues.apache.org/jira/browse/YARN-8259
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8259.001.patch
>
>
> As privileged containers may execute as a user that does not match the YARN 
> run as user, sending the null signal for liveliness checks could fail. We 
> need to reconsider how liveliness checks are handled in the Docker case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8179) Preemption does not happen due to natural_termination_factor when DRF is used

2018-05-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483035#comment-16483035
 ] 

Hudson commented on YARN-8179:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14247 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14247/])
YARN-8179: Preemption does not happen due to natural_termination_factor (ericp: 
rev 0b4c44bdeef62945b592d5761666ad026b629c0b)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicyInterQueueWithDRF.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/PreemptableResourceCalculator.java


> Preemption does not happen due to natural_termination_factor when DRF is used
> -
>
> Key: YARN-8179
> URL: https://issues.apache.org/jira/browse/YARN-8179
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: kyungwan nam
>Assignee: kyungwan nam
>Priority: Major
> Attachments: YARN-8179.001.patch, YARN-8179.002.patch, 
> YARN-8179.003.patch
>
>
> cluster
> * DominantResourceCalculator
> * QueueA : 50 (capacity) ~ 100 (max capacity)
> * QueueB : 50 (capacity) ~ 50 (max capacity)
> all of resources have been allocated to QueueA. (all Vcores are allocated to 
> QueueA)
> if App1 is submitted to QueueB, over-utilized QueueA should be preempted.
> but, I’ve met the problem, which preemption does not happen. it caused that 
> App1 AM can not allocated.
> when App1 is submitted, pending resources for asking App1 AM would be 
> 
> so, Vcores which need to be preempted from QueueB should be 1.
> but, it can be 0 due to natural_termination_factor (default is 0.2)
> we should guarantee that resources not to be 0 even though applying 
> natural_termination_factor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-05-21 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483032#comment-16483032
 ] 

Jason Lowe commented on YARN-8259:
--

I do agree with Shane that there are already subsystems that currently rely on 
/proc to function properly, e.g.: container resource monitoring.  Hiding pids 
will break those subsystems.

> Revisit liveliness checks for Docker containers
> ---
>
> Key: YARN-8259
> URL: https://issues.apache.org/jira/browse/YARN-8259
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8259.001.patch
>
>
> As privileged containers may execute as a user that does not match the YARN 
> run as user, sending the null signal for liveliness checks could fail. We 
> need to reconsider how liveliness checks are handled in the Docker case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-05-21 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483029#comment-16483029
 ] 

Jason Lowe commented on YARN-8259:
--

Ah comment race with [~eyang], I'll defer until his concerns are addressed.

> Revisit liveliness checks for Docker containers
> ---
>
> Key: YARN-8259
> URL: https://issues.apache.org/jira/browse/YARN-8259
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8259.001.patch
>
>
> As privileged containers may execute as a user that does not match the YARN 
> run as user, sending the null signal for liveliness checks could fail. We 
> need to reconsider how liveliness checks are handled in the Docker case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-05-21 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483028#comment-16483028
 ] 

Jason Lowe commented on YARN-8259:
--

Thanks for the patch!  +1 lgtm.  I'll commit this tomorrow if there are no 
objections.

> Revisit liveliness checks for Docker containers
> ---
>
> Key: YARN-8259
> URL: https://issues.apache.org/jira/browse/YARN-8259
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8259.001.patch
>
>
> As privileged containers may execute as a user that does not match the YARN 
> run as user, sending the null signal for liveliness checks could fail. We 
> need to reconsider how liveliness checks are handled in the Docker case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8326) Yarn 3.0 seems runs slower than Yarn 2.6

2018-05-21 Thread Hsin-Liang Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483027#comment-16483027
 ] 

Hsin-Liang Huang commented on YARN-8326:


HI Eric, 

   I tried the suggestion and changed the setting.  The result on running 

{color:#14892c}time hadoop jar 
/usr/hdp/3.0.0.0-829/hadoop-yarn/hadoop-yarn-applications-unmanaged-am-launcher-3.0.0.3.0.0.0-829.jar
 Client -classpath simple-yarn-app-1.1.0.jar -cmd "java 
com.hortonworks.simpleyarnapp.ApplicationMaster /bin/date 8"{color}

 is 20s, 15s and 15s (I ran it 3 times).   It didn't get better if it's not 
worse.  (It was 14, 15 seconds before). 

> Yarn 3.0 seems runs slower than Yarn 2.6
> 
>
> Key: YARN-8326
> URL: https://issues.apache.org/jira/browse/YARN-8326
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0
> Environment: This is the yarn-site.xml for 3.0. 
>  
> 
> 
>  hadoop.registry.dns.bind-port
>  5353
>  
> 
>  hadoop.registry.dns.domain-name
>  hwx.site
>  
> 
>  hadoop.registry.dns.enabled
>  true
>  
> 
>  hadoop.registry.dns.zone-mask
>  255.255.255.0
>  
> 
>  hadoop.registry.dns.zone-subnet
>  172.17.0.0
>  
> 
>  manage.include.files
>  false
>  
> 
>  yarn.acl.enable
>  false
>  
> 
>  yarn.admin.acl
>  yarn
>  
> 
>  yarn.client.nodemanager-connect.max-wait-ms
>  6
>  
> 
>  yarn.client.nodemanager-connect.retry-interval-ms
>  1
>  
> 
>  yarn.http.policy
>  HTTP_ONLY
>  
> 
>  yarn.log-aggregation-enable
>  false
>  
> 
>  yarn.log-aggregation.retain-seconds
>  2592000
>  
> 
>  yarn.log.server.url
>  
> [http://xx:19888/jobhistory/logs|http://whiny2.fyre.ibm.com:19888/jobhistory/logs]
>  
> 
>  yarn.log.server.web-service.url
>  
> [http://xx:8188/ws/v1/applicationhistory|http://whiny2.fyre.ibm.com:8188/ws/v1/applicationhistory]
>  
> 
>  yarn.node-labels.enabled
>  false
>  
> 
>  yarn.node-labels.fs-store.retry-policy-spec
>  2000, 500
>  
> 
>  yarn.node-labels.fs-store.root-dir
>  /system/yarn/node-labels
>  
> 
>  yarn.nodemanager.address
>  0.0.0.0:45454
>  
> 
>  yarn.nodemanager.admin-env
>  MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX
>  
> 
>  yarn.nodemanager.aux-services
>  mapreduce_shuffle,spark2_shuffle,timeline_collector
>  
> 
>  yarn.nodemanager.aux-services.mapreduce_shuffle.class
>  org.apache.hadoop.mapred.ShuffleHandler
>  
> 
>  yarn.nodemanager.aux-services.spark2_shuffle.class
>  org.apache.spark.network.yarn.YarnShuffleService
>  
> 
>  yarn.nodemanager.aux-services.spark2_shuffle.classpath
>  /usr/spark2/aux/*
>  
> 
>  yarn.nodemanager.aux-services.spark_shuffle.class
>  org.apache.spark.network.yarn.YarnShuffleService
>  
> 
>  yarn.nodemanager.aux-services.timeline_collector.class
>  
> org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService
>  
> 
>  yarn.nodemanager.bind-host
>  0.0.0.0
>  
> 
>  yarn.nodemanager.container-executor.class
>  
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor
>  
> 
>  yarn.nodemanager.container-metrics.unregister-delay-ms
>  6
>  
> 
>  yarn.nodemanager.container-monitor.interval-ms
>  3000
>  
> 
>  yarn.nodemanager.delete.debug-delay-sec
>  0
>  
> 
>  
> yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage
>  90
>  
> 
>  yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb
>  1000
>  
> 
>  yarn.nodemanager.disk-health-checker.min-healthy-disks
>  0.25
>  
> 
>  yarn.nodemanager.health-checker.interval-ms
>  135000
>  
> 
>  yarn.nodemanager.health-checker.script.timeout-ms
>  6
>  
> 
>  
> yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage
>  false
>  
> 
>  yarn.nodemanager.linux-container-executor.group
>  hadoop
>  
> 
>  
> yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users
>  false
>  
> 
>  yarn.nodemanager.local-dirs
>  /hadoop/yarn/local
>  
> 
>  yarn.nodemanager.log-aggregation.compression-type
>  gz
>  
> 
>  yarn.nodemanager.log-aggregation.debug-enabled
>  false
>  
> 
>  yarn.nodemanager.log-aggregation.num-log-files-per-app
>  30
>  
> 
>  
> yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds
>  3600
>  
> 
>  yarn.nodemanager.log-dirs
>  /hadoop/yarn/log
>  
> 
>  yarn.nodemanager.log.retain-seconds
>  604800
>  
> 
>  yarn.nodemanager.pmem-check-enabled
>  false
>  
> 
>  yarn.nodemanager.recovery.dir
>  /var/log/hadoop-yarn/nodemanager/recovery-state
>  
> 
>  yarn.nodemanager.recovery.enabled
>  true
>  
> 
>  yarn.nodemanager.recovery.supervised
>  true
>  
> 
>  yarn.nodemanager.remote-app-log-dir
>  /app-logs
>  
> 
>  yarn.nodemanager.remote-app-log-dir-suffix
>  logs
>  
> 
>  yarn.nodemanager.resource-plugins
>  
>  
> 
>  yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices
>  auto
>  
> 
>  yarn.nodemanager.resource

[jira] [Commented] (YARN-8173) [Router] Implement missing FederationClientInterceptor#getApplications()

2018-05-21 Thread Yiran Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483021#comment-16483021
 ] 

Yiran Wu commented on YARN-8173:


Thanks [~giovanni.fumarola]. I looked at 
[YARN-7010|https://issues.apache.org/jira/browse/YARN-7010] logic, the function 
is more powerful.
I'll modify the logic based on 
[YARN-7010|https://issues.apache.org/jira/browse/YARN-7010].

> [Router] Implement missing FederationClientInterceptor#getApplications()
> 
>
> Key: YARN-8173
> URL: https://issues.apache.org/jira/browse/YARN-8173
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Yiran Wu
>Assignee: Yiran Wu
>Priority: Major
> Attachments: YARN-8173.001.patch, YARN-8173.002.patch, 
> YARN-8173.003.patch
>
>
> oozie dependent method Implement
> {code:java}
> getApplications()
> getDeglationToken()
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8336) Fix potential connection leak in SchedConfCLI and YarnWebServiceUtils

2018-05-21 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8336:
---
Description: Missing ClientResponse.close and Client.destroy can lead to a 
connection leak.

> Fix potential connection leak in SchedConfCLI and YarnWebServiceUtils
> -
>
> Key: YARN-8336
> URL: https://issues.apache.org/jira/browse/YARN-8336
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8336.v1.patch
>
>
> Missing ClientResponse.close and Client.destroy can lead to a connection leak.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8336) Fix potential connection leak in SchedConfCLI and YarnWebServiceUtils

2018-05-21 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8336:
---
Attachment: YARN-8336.v1.patch

> Fix potential connection leak in SchedConfCLI and YarnWebServiceUtils
> -
>
> Key: YARN-8336
> URL: https://issues.apache.org/jira/browse/YARN-8336
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8336.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8336) Fix potential connection leak in SchedConfCLI and YarnWebServiceUtils

2018-05-21 Thread Giovanni Matteo Fumarola (JIRA)
Giovanni Matteo Fumarola created YARN-8336:
--

 Summary: Fix potential connection leak in SchedConfCLI and 
YarnWebServiceUtils
 Key: YARN-8336
 URL: https://issues.apache.org/jira/browse/YARN-8336
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Giovanni Matteo Fumarola
Assignee: Giovanni Matteo Fumarola






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8173) [Router] Implement missing FederationClientInterceptor#getApplications()

2018-05-21 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482993#comment-16482993
 ] 

Giovanni Matteo Fumarola commented on YARN-8173:


{\{getApplications}} is a bit more complicated than the logic you implemented. 
Please take a look at YARN-7010 for the full context of the logic behind 
\{{getApps}}.

> [Router] Implement missing FederationClientInterceptor#getApplications()
> 
>
> Key: YARN-8173
> URL: https://issues.apache.org/jira/browse/YARN-8173
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Yiran Wu
>Assignee: Yiran Wu
>Priority: Major
> Attachments: YARN-8173.001.patch, YARN-8173.002.patch, 
> YARN-8173.003.patch
>
>
> oozie dependent method Implement
> {code:java}
> getApplications()
> getDeglationToken()
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8259) Revisit liveliness checks for Docker containers

2018-05-21 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482992#comment-16482992
 ] 

Eric Yang edited comment on YARN-8259 at 5/21/18 8:15 PM:
--

If I am not mistaken, DockerContainerRuntime is running as part of node 
manager.  If hidepid option is used by system administrator, yarn user might 
not have rights to check if /proc/[pid] exists.  We might need to create a LCE 
operation to perform the check, if we are going with the suggested pid file 
check path.

I prefer the docker inspect command path with retry logic.  In a non-blocking 
IO system, it is hard to avoid coding logic for retries.  The investment will 
pay off in the long run, when each retry value is defined and optimized to make 
the system reliable and robust.


was (Author: eyang):
If I am not mistaken, DockerContainerRuntime is running as part of node 
manager.  If hidepid option is used by system administrator, yarn user might 
not have rights to check if /proc/[pid] exists.  We might need to create a LCE 
operation to perform the check, if we are going with the suggested pid file 
check path.

I still prefers the docker inspect command path with retry logic.  In a 
non-blocking IO system, it is hard to avoid coding logic for retries.  The 
investment will pay off in the long run, when each retry value is defined and 
optimized to make the system reliable and robust.

> Revisit liveliness checks for Docker containers
> ---
>
> Key: YARN-8259
> URL: https://issues.apache.org/jira/browse/YARN-8259
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8259.001.patch
>
>
> As privileged containers may execute as a user that does not match the YARN 
> run as user, sending the null signal for liveliness checks could fail. We 
> need to reconsider how liveliness checks are handled in the Docker case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-05-21 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482992#comment-16482992
 ] 

Eric Yang commented on YARN-8259:
-

If I am not mistaken, DockerContainerRuntime is running as part of node 
manager.  If hidepid option is used by system administrator, yarn user might 
not have rights to check if /proc/[pid] exists.  We might need to create a LCE 
operation to perform the check, if we are going with the suggested pid file 
check path.

I still prefers the docker inspect command path with retry logic.  In a 
non-blocking IO system, it is hard to avoid coding logic for retries.  The 
investment will pay off in the long run, when each retry value is defined and 
optimized to make the system reliable and robust.

> Revisit liveliness checks for Docker containers
> ---
>
> Key: YARN-8259
> URL: https://issues.apache.org/jira/browse/YARN-8259
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8259.001.patch
>
>
> As privileged containers may execute as a user that does not match the YARN 
> run as user, sending the null signal for liveliness checks could fail. We 
> need to reconsider how liveliness checks are handled in the Docker case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8173) [Router] FederationClientInterceptor#

2018-05-21 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8173:
---
Summary: [Router] FederationClientInterceptor#  (was: oozie dependent 
ClientAPI Implement )

> [Router] FederationClientInterceptor#
> -
>
> Key: YARN-8173
> URL: https://issues.apache.org/jira/browse/YARN-8173
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Yiran Wu
>Assignee: Yiran Wu
>Priority: Major
> Attachments: YARN-8173.001.patch, YARN-8173.002.patch, 
> YARN-8173.003.patch
>
>
> oozie dependent method Implement
> {code:java}
> getApplications()
> getDeglationToken()
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8173) [Router] Implement missing FederationClientInterceptor#getApplications()

2018-05-21 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8173:
---
Summary: [Router] Implement missing 
FederationClientInterceptor#getApplications()  (was: [Router] 
FederationClientInterceptor#)

> [Router] Implement missing FederationClientInterceptor#getApplications()
> 
>
> Key: YARN-8173
> URL: https://issues.apache.org/jira/browse/YARN-8173
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Yiran Wu
>Assignee: Yiran Wu
>Priority: Major
> Attachments: YARN-8173.001.patch, YARN-8173.002.patch, 
> YARN-8173.003.patch
>
>
> oozie dependent method Implement
> {code:java}
> getApplications()
> getDeglationToken()
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8173) oozie dependent ClientAPI Implement

2018-05-21 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482986#comment-16482986
 ] 

Giovanni Matteo Fumarola commented on YARN-8173:


Moved under YARN-7402 and updated the title.

> oozie dependent ClientAPI Implement 
> 
>
> Key: YARN-8173
> URL: https://issues.apache.org/jira/browse/YARN-8173
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Yiran Wu
>Assignee: Yiran Wu
>Priority: Major
> Attachments: YARN-8173.001.patch, YARN-8173.002.patch, 
> YARN-8173.003.patch
>
>
> oozie dependent method Implement
> {code:java}
> getApplications()
> getDeglationToken()
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8173) oozie dependent ClientAPI Implement

2018-05-21 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8173:
---
Issue Type: Sub-task  (was: Task)
Parent: YARN-7402

> oozie dependent ClientAPI Implement 
> 
>
> Key: YARN-8173
> URL: https://issues.apache.org/jira/browse/YARN-8173
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Yiran Wu
>Assignee: Yiran Wu
>Priority: Major
> Attachments: YARN-8173.001.patch, YARN-8173.002.patch, 
> YARN-8173.003.patch
>
>
> oozie dependent method Implement
> {code:java}
> getApplications()
> getDeglationToken()
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8173) oozie dependent ClientAPI Implement

2018-05-21 Thread Yiran Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiran Wu reassigned YARN-8173:
--

Assignee: Yiran Wu

> oozie dependent ClientAPI Implement 
> 
>
> Key: YARN-8173
> URL: https://issues.apache.org/jira/browse/YARN-8173
> Project: Hadoop YARN
>  Issue Type: Task
>Affects Versions: 3.0.0
>Reporter: Yiran Wu
>Assignee: Yiran Wu
>Priority: Major
> Attachments: YARN-8173.001.patch, YARN-8173.002.patch, 
> YARN-8173.003.patch
>
>
> oozie dependent method Implement
> {code:java}
> getApplications()
> getDeglationToken()
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8173) oozie dependent ClientAPI Implement

2018-05-21 Thread Yiran Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482978#comment-16482978
 ] 

Yiran Wu commented on YARN-8173:


[~giovanni.fumarola], do you mind taking a look to this patch?

> oozie dependent ClientAPI Implement 
> 
>
> Key: YARN-8173
> URL: https://issues.apache.org/jira/browse/YARN-8173
> Project: Hadoop YARN
>  Issue Type: Task
>Affects Versions: 3.0.0
>Reporter: Yiran Wu
>Priority: Major
> Attachments: YARN-8173.001.patch, YARN-8173.002.patch, 
> YARN-8173.003.patch
>
>
> oozie dependent method Implement
> {code:java}
> getApplications()
> getDeglationToken()
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8334) Fix potential connection leak in GPGUtils

2018-05-21 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482967#comment-16482967
 ] 

Giovanni Matteo Fumarola commented on YARN-8334:


[~botong] please take a look.

> Fix potential connection leak in GPGUtils
> -
>
> Key: YARN-8334
> URL: https://issues.apache.org/jira/browse/YARN-8334
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Minor
> Attachments: YARN-8334-YARN-7402.v1.patch
>
>
> Missing ClientResponse.close and Client.destroy can lead to a connection leak.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8334) Fix potential connection leak in GPGUtils

2018-05-21 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8334:
---
Description: Missing ClientResponse.close and Client.destroy can lead to a 
connection leak.

> Fix potential connection leak in GPGUtils
> -
>
> Key: YARN-8334
> URL: https://issues.apache.org/jira/browse/YARN-8334
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Minor
> Attachments: YARN-8334-YARN-7402.v1.patch
>
>
> Missing ClientResponse.close and Client.destroy can lead to a connection leak.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8334) Fix potential connection leak in GPGUtils

2018-05-21 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8334:
---
Attachment: YARN-8334-YARN-7402.v1.patch

> Fix potential connection leak in GPGUtils
> -
>
> Key: YARN-8334
> URL: https://issues.apache.org/jira/browse/YARN-8334
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Minor
> Attachments: YARN-8334-YARN-7402.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8335) Privileged docker containers' jobSubmitDir does not get successfully cleaned up

2018-05-21 Thread Eric Badger (JIRA)
Eric Badger created YARN-8335:
-

 Summary: Privileged docker containers' jobSubmitDir does not get 
successfully cleaned up
 Key: YARN-8335
 URL: https://issues.apache.org/jira/browse/YARN-8335
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Eric Badger


The jobSubmitDir directory is owned by root and is being cleaned up as the 
submitting user, which appears to be why it is failing to clean up.

{noformat}
2018-05-21 19:46:15,124 WARN  [DeletionService #0] 
privileged.PrivilegedOperationExecutor 
(PrivilegedOperationExecutor.java:executePrivilegedOperation(174)) - Shell 
execution returned exit code: 255. Privileged Execution Operation Stderr:

Stdout: main : command provided 3
main : run as user is ebadger
main : requested yarn user is ebadger
failed to unlink 
/tmp/hadoop-local3/usercache/ebadger/appcache/application_1526931492976_0007/container_1526931492976_0007_01_01/jobSubmitDir/job.split:
 Permission denied
failed to unlink 
/tmp/hadoop-local3/usercache/ebadger/appcache/application_1526931492976_0007/container_1526931492976_0007_01_01/jobSubmitDir/job.splitmetainfo:
 Permission denied
failed to rmdir jobSubmitDir: Directory not empty
Error while deleting 
/tmp/hadoop-local3/usercache/ebadger/appcache/application_1526931492976_0007/container_1526931492976_0007_01_01:
 39 (Directory not empty)

Full command array for failed execution:
[/hadoop-3.2.0-SNAPSHOT/bin/container-executor, ebadger, ebadger, 3, 
/tmp/hadoop-local3/usercache/ebadger/appcache/application_1526931492976_0007/container_1526931492976_0007_01_01]
2018-05-21 19:46:15,124 ERROR [DeletionService #0] 
nodemanager.LinuxContainerExecutor 
(LinuxContainerExecutor.java:deleteAsUser(848)) - DeleteAsUser for 
/tmp/hadoop-local3/usercache/ebadger/appcache/application_1526931492976_0007/container_1526931492976_0007_01_01
 returned with exit code: 255
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
 ExitCodeException exitCode=255:
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:180)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:206)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.deleteAsUser(LinuxContainerExecutor.java:844)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.deletion.task.FileDeletionTask.run(FileDeletionTask.java:135)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: ExitCodeException exitCode=255:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:1009)
at org.apache.hadoop.util.Shell.run(Shell.java:902)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152)
... 10 more
{noformat}

{noformat}
[foo@bar hadoop]$ ls -l 
/tmp/hadoop-local3/usercache/ebadger/appcache/application_1526931492976_0007/container_1526931492976_0007_01_01/
total 4
drwxr-sr-x 2 root users 4096 May 21 19:45 jobSubmitDir
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8334) Fix potential connection leak in GPGUtils

2018-05-21 Thread Giovanni Matteo Fumarola (JIRA)
Giovanni Matteo Fumarola created YARN-8334:
--

 Summary: Fix potential connection leak in GPGUtils
 Key: YARN-8334
 URL: https://issues.apache.org/jira/browse/YARN-8334
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Giovanni Matteo Fumarola
Assignee: Giovanni Matteo Fumarola






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs

2018-05-21 Thread Yiran Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482955#comment-16482955
 ] 

Yiran Wu edited comment on YARN-8041 at 5/21/18 7:51 PM:
-

[~giovanni.fumarola] Ok, Thanks tell me this and review code. The previous 
Patch has a little problem. I'll fix it.


was (Author: yiran):
Ok, Thanks tell me this and review code. The previous Patch has a little 
problem. I'll fix it.

> [Router] Federation: routing some missing REST invocations transparently to 
> multiple RMs
> 
>
> Key: YARN-8041
> URL: https://issues.apache.org/jira/browse/YARN-8041
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, router
>Reporter: Yiran Wu
>Assignee: Yiran Wu
>Priority: Minor
> Attachments: YARN-8041.001.patch, YARN-8041.002.patch, 
> YARN-8041.003.patch, YARN-8041.004.patch
>
>
> This Jira tracks the implementation of some missing REST invocation in 
> FederationInterceptorREST:
>  * getAppStatistics
>  * getNodeToLabels
>  * getLabelsOnNode
>  * updateApplicationPriority
>  * getAppQueue
>  * updateAppQueue
>  * getAppTimeout
>  * getAppTimeouts
>  * updateApplicationTimeout
>  * getAppAttempts
>  * getAppAttempt
>  * getContainers
>  * getContainer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4677) RMNodeResourceUpdateEvent update from scheduler can lead to race condition

2018-05-21 Thread Kanwaljeet Sachdev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482962#comment-16482962
 ] 

Kanwaljeet Sachdev commented on YARN-4677:
--

[~wilfreds], thanks for the patch and the context on it. The diffs look good. I 
guess just adding little more description that a NPE could occur because the 
heartbeat message might arrive after decommissioned along with stack trace will 
be good to have full context. The diffs look good, adding the trace will be 
beneficial in the Jira here.

> RMNodeResourceUpdateEvent update from scheduler can lead to race condition
> --
>
> Key: YARN-4677
> URL: https://issues.apache.org/jira/browse/YARN-4677
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, resourcemanager, scheduler
>Affects Versions: 2.7.1
>Reporter: Brook Zhou
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-4677-branch-2.001.patch, 
> YARN-4677-branch-2.002.patch, YARN-4677.01.patch
>
>
> When a node is in decommissioning state, there is time window between 
> completedContainer() and RMNodeResourceUpdateEvent get handled in 
> scheduler.nodeUpdate (YARN-3223). 
> So if a scheduling effort happens within this window, the new container could 
> still get allocated on this node. Even worse case is if scheduling effort 
> happen after RMNodeResourceUpdateEvent sent out but before it is propagated 
> to SchedulerNode - then the total resource is lower than used resource and 
> available resource is a negative value. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs

2018-05-21 Thread Yiran Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482955#comment-16482955
 ] 

Yiran Wu commented on YARN-8041:


Ok, Thanks tell me this and review code. The previous Patch has a little 
problem. I'll fix it.

> [Router] Federation: routing some missing REST invocations transparently to 
> multiple RMs
> 
>
> Key: YARN-8041
> URL: https://issues.apache.org/jira/browse/YARN-8041
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, router
>Reporter: Yiran Wu
>Assignee: Yiran Wu
>Priority: Minor
> Attachments: YARN-8041.001.patch, YARN-8041.002.patch, 
> YARN-8041.003.patch, YARN-8041.004.patch
>
>
> This Jira tracks the implementation of some missing REST invocation in 
> FederationInterceptorREST:
>  * getAppStatistics
>  * getNodeToLabels
>  * getLabelsOnNode
>  * updateApplicationPriority
>  * getAppQueue
>  * updateAppQueue
>  * getAppTimeout
>  * getAppTimeouts
>  * updateApplicationTimeout
>  * getAppAttempts
>  * getAppAttempt
>  * getContainers
>  * getContainer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs

2018-05-21 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482950#comment-16482950
 ] 

Giovanni Matteo Fumarola commented on YARN-8041:


[~yiran] thanks for the patch. 
You should not change the status of the Jira everytime you submit a patch. The 
system will pick up the latest patch in automatic

 

> [Router] Federation: routing some missing REST invocations transparently to 
> multiple RMs
> 
>
> Key: YARN-8041
> URL: https://issues.apache.org/jira/browse/YARN-8041
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, router
>Reporter: Yiran Wu
>Assignee: Yiran Wu
>Priority: Minor
> Attachments: YARN-8041.001.patch, YARN-8041.002.patch, 
> YARN-8041.003.patch, YARN-8041.004.patch
>
>
> This Jira tracks the implementation of some missing REST invocation in 
> FederationInterceptorREST:
>  * getAppStatistics
>  * getNodeToLabels
>  * getLabelsOnNode
>  * updateApplicationPriority
>  * getAppQueue
>  * updateAppQueue
>  * getAppTimeout
>  * getAppTimeouts
>  * updateApplicationTimeout
>  * getAppAttempts
>  * getAppAttempt
>  * getContainers
>  * getContainer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs

2018-05-21 Thread Yiran Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiran Wu updated YARN-8041:
---
Attachment: YARN-8041.004.patch

> [Router] Federation: routing some missing REST invocations transparently to 
> multiple RMs
> 
>
> Key: YARN-8041
> URL: https://issues.apache.org/jira/browse/YARN-8041
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, router
>Reporter: Yiran Wu
>Assignee: Yiran Wu
>Priority: Minor
> Attachments: YARN-8041.001.patch, YARN-8041.002.patch, 
> YARN-8041.003.patch, YARN-8041.004.patch
>
>
> This Jira tracks the implementation of some missing REST invocation in 
> FederationInterceptorREST:
>  * getAppStatistics
>  * getNodeToLabels
>  * getLabelsOnNode
>  * updateApplicationPriority
>  * getAppQueue
>  * updateAppQueue
>  * getAppTimeout
>  * getAppTimeouts
>  * updateApplicationTimeout
>  * getAppAttempts
>  * getAppAttempt
>  * getContainers
>  * getContainer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs

2018-05-21 Thread Yiran Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiran Wu updated YARN-8041:
---
Attachment: (was: YARN-8041.004.patch)

> [Router] Federation: routing some missing REST invocations transparently to 
> multiple RMs
> 
>
> Key: YARN-8041
> URL: https://issues.apache.org/jira/browse/YARN-8041
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, router
>Reporter: Yiran Wu
>Assignee: Yiran Wu
>Priority: Minor
> Attachments: YARN-8041.001.patch, YARN-8041.002.patch, 
> YARN-8041.003.patch
>
>
> This Jira tracks the implementation of some missing REST invocation in 
> FederationInterceptorREST:
>  * getAppStatistics
>  * getNodeToLabels
>  * getLabelsOnNode
>  * updateApplicationPriority
>  * getAppQueue
>  * updateAppQueue
>  * getAppTimeout
>  * getAppTimeouts
>  * updateApplicationTimeout
>  * getAppAttempts
>  * getAppAttempt
>  * getContainers
>  * getContainer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs

2018-05-21 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482912#comment-16482912
 ] 

genericqa commented on YARN-8041:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
29s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
48s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
 0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  1s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
5s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
14s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 46s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: The patch generated 54 new 
+ 18 unchanged - 0 fixed = 72 total (was 18) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 22s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
14s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 50s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
21s{color} | {color:green} hadoop-yarn-server-router in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}138m 48s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector 
|
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8041 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12924375/YARN-8041.004.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 1e706175c57b 4.4.

[jira] [Comment Edited] (YARN-8326) Yarn 3.0 seems runs slower than Yarn 2.6

2018-05-21 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482905#comment-16482905
 ] 

Eric Yang edited comment on YARN-8326 at 5/21/18 6:57 PM:
--

This appears to be introduced by YARN-5662 by turning on container monitor from 
false to true.  This feature is used by opportunistic container scheduling and 
pre-emption to gather statics of the containers to make scheduling decisions.  
You can disable this feature by:

{code}

  yarn.nodemanager.container-monitor.enabled
  false

{code}

Or reduce the stats collection time from 3 seconds to 300 milliseconds (use 
more system resources, but faster scheduling):

{code}

  yarn.nodemanager.container-monitor.interval-ms
  300

{code}

Timer optimization might be possible to the work done in YARN-2883.  The 
queuing and scheduling of containers is based on monitoring thread information. 
 If it takes several seconds to wait for information to become available before 
next container is scheduled, then it can introduced artificial delay to rapidly 
launching containers.  The timer value can not be smaller than certain value 
otherwise monitoring/container forking both will tax cpu resources too much.  
If your workloads take less time than container scheduling/launch, then you 
might need to revisit how to decrease the containers to launch, and increase 
the work to run in containers.  [~hlhu...@us.ibm.com] Can you confirm that 
those settings changes the benchmark result?


was (Author: eyang):
This appears to be introduced by YARN-5662 by turning on container monitor from 
false to true.  This feature is used by opportunistic container scheduling and 
pre-emption to gather statics of the containers to make scheduling decisions.  
You can disable this feature by:

{code}

  yarn.nodemanager.container-monitor.enabled
  false

{code}

Or reduce the stats collection time from 3 seconds to 300 milliseconds (use 
more system resources, but faster scheduling):

{code}

  yarn.nodemanager.container-monitor.interval-ms
  300

{code}

Timer optimization might be possible to the work done in YARN-2883.  The 
queuing and scheduling of containers is based on monitoring thread information. 
 If it takes several seconds to wait for information to become available before 
next container is scheduled, then it can introduced artificial delay to rapidly 
launching containers.  The timer value can not be smaller than certain value 
otherwise monitoring/container forking both will tax cpu resources too much 
much.  If your workloads take less time than container scheduling/launch, then 
you might need to revisit how to decrease the containers to launch, and 
increase the work to run in containers.  [~hlhu...@us.ibm.com] Can you confirm 
that those settings changes the benchmark result?

> Yarn 3.0 seems runs slower than Yarn 2.6
> 
>
> Key: YARN-8326
> URL: https://issues.apache.org/jira/browse/YARN-8326
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0
> Environment: This is the yarn-site.xml for 3.0. 
>  
> 
> 
>  hadoop.registry.dns.bind-port
>  5353
>  
> 
>  hadoop.registry.dns.domain-name
>  hwx.site
>  
> 
>  hadoop.registry.dns.enabled
>  true
>  
> 
>  hadoop.registry.dns.zone-mask
>  255.255.255.0
>  
> 
>  hadoop.registry.dns.zone-subnet
>  172.17.0.0
>  
> 
>  manage.include.files
>  false
>  
> 
>  yarn.acl.enable
>  false
>  
> 
>  yarn.admin.acl
>  yarn
>  
> 
>  yarn.client.nodemanager-connect.max-wait-ms
>  6
>  
> 
>  yarn.client.nodemanager-connect.retry-interval-ms
>  1
>  
> 
>  yarn.http.policy
>  HTTP_ONLY
>  
> 
>  yarn.log-aggregation-enable
>  false
>  
> 
>  yarn.log-aggregation.retain-seconds
>  2592000
>  
> 
>  yarn.log.server.url
>  
> [http://xx:19888/jobhistory/logs|http://whiny2.fyre.ibm.com:19888/jobhistory/logs]
>  
> 
>  yarn.log.server.web-service.url
>  
> [http://xx:8188/ws/v1/applicationhistory|http://whiny2.fyre.ibm.com:8188/ws/v1/applicationhistory]
>  
> 
>  yarn.node-labels.enabled
>  false
>  
> 
>  yarn.node-labels.fs-store.retry-policy-spec
>  2000, 500
>  
> 
>  yarn.node-labels.fs-store.root-dir
>  /system/yarn/node-labels
>  
> 
>  yarn.nodemanager.address
>  0.0.0.0:45454
>  
> 
>  yarn.nodemanager.admin-env
>  MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX
>  
> 
>  yarn.nodemanager.aux-services
>  mapreduce_shuffle,spark2_shuffle,timeline_collector
>  
> 
>  yarn.nodemanager.aux-services.mapreduce_shuffle.class
>  org.apache.hadoop.mapred.ShuffleHandler
>  
> 
>  yarn.nodemanager.aux-services.spark2_shuffle.class
>  org.apache.spark.network.yarn.YarnShuffleService
>  
> 
>  yarn.nodemanager.aux-services.spark2_shuffle.classpath
>  /usr/spark2/aux/*
>  
> 
>  yarn.nodemanager

[jira] [Commented] (YARN-8326) Yarn 3.0 seems runs slower than Yarn 2.6

2018-05-21 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482905#comment-16482905
 ] 

Eric Yang commented on YARN-8326:
-

This appears to be introduced by YARN-5662 by turning on container monitor from 
false to true.  This feature is used by opportunistic container scheduling and 
pre-emption to gather statics of the containers to make scheduling decisions.  
You can disable this feature by:

{code}

  yarn.nodemanager.container-monitor.enabled
  false

{code}

Or reduce the stats collection time from 3 seconds to 300 milliseconds (use 
more system resources, but faster scheduling):

{code}

  yarn.nodemanager.container-monitor.interval-ms
  300

{code}

Timer optimization might be possible to the work done in YARN-2883.  The 
queuing and scheduling of containers is based on monitoring thread information. 
 If it takes several seconds to wait for information to become available before 
next container is scheduled, then it can introduced artificial delay to rapidly 
launching containers.  The timer value can not be smaller than certain value 
otherwise monitoring/container forking both will tax cpu resources too much 
much.  If your workloads take less time than container scheduling/launch, then 
you might need to revisit how to decrease the containers to launch, and 
increase the work to run in containers.  [~hlhu...@us.ibm.com] Can you confirm 
that those settings changes the benchmark result?

> Yarn 3.0 seems runs slower than Yarn 2.6
> 
>
> Key: YARN-8326
> URL: https://issues.apache.org/jira/browse/YARN-8326
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0
> Environment: This is the yarn-site.xml for 3.0. 
>  
> 
> 
>  hadoop.registry.dns.bind-port
>  5353
>  
> 
>  hadoop.registry.dns.domain-name
>  hwx.site
>  
> 
>  hadoop.registry.dns.enabled
>  true
>  
> 
>  hadoop.registry.dns.zone-mask
>  255.255.255.0
>  
> 
>  hadoop.registry.dns.zone-subnet
>  172.17.0.0
>  
> 
>  manage.include.files
>  false
>  
> 
>  yarn.acl.enable
>  false
>  
> 
>  yarn.admin.acl
>  yarn
>  
> 
>  yarn.client.nodemanager-connect.max-wait-ms
>  6
>  
> 
>  yarn.client.nodemanager-connect.retry-interval-ms
>  1
>  
> 
>  yarn.http.policy
>  HTTP_ONLY
>  
> 
>  yarn.log-aggregation-enable
>  false
>  
> 
>  yarn.log-aggregation.retain-seconds
>  2592000
>  
> 
>  yarn.log.server.url
>  
> [http://xx:19888/jobhistory/logs|http://whiny2.fyre.ibm.com:19888/jobhistory/logs]
>  
> 
>  yarn.log.server.web-service.url
>  
> [http://xx:8188/ws/v1/applicationhistory|http://whiny2.fyre.ibm.com:8188/ws/v1/applicationhistory]
>  
> 
>  yarn.node-labels.enabled
>  false
>  
> 
>  yarn.node-labels.fs-store.retry-policy-spec
>  2000, 500
>  
> 
>  yarn.node-labels.fs-store.root-dir
>  /system/yarn/node-labels
>  
> 
>  yarn.nodemanager.address
>  0.0.0.0:45454
>  
> 
>  yarn.nodemanager.admin-env
>  MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX
>  
> 
>  yarn.nodemanager.aux-services
>  mapreduce_shuffle,spark2_shuffle,timeline_collector
>  
> 
>  yarn.nodemanager.aux-services.mapreduce_shuffle.class
>  org.apache.hadoop.mapred.ShuffleHandler
>  
> 
>  yarn.nodemanager.aux-services.spark2_shuffle.class
>  org.apache.spark.network.yarn.YarnShuffleService
>  
> 
>  yarn.nodemanager.aux-services.spark2_shuffle.classpath
>  /usr/spark2/aux/*
>  
> 
>  yarn.nodemanager.aux-services.spark_shuffle.class
>  org.apache.spark.network.yarn.YarnShuffleService
>  
> 
>  yarn.nodemanager.aux-services.timeline_collector.class
>  
> org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService
>  
> 
>  yarn.nodemanager.bind-host
>  0.0.0.0
>  
> 
>  yarn.nodemanager.container-executor.class
>  
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor
>  
> 
>  yarn.nodemanager.container-metrics.unregister-delay-ms
>  6
>  
> 
>  yarn.nodemanager.container-monitor.interval-ms
>  3000
>  
> 
>  yarn.nodemanager.delete.debug-delay-sec
>  0
>  
> 
>  
> yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage
>  90
>  
> 
>  yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb
>  1000
>  
> 
>  yarn.nodemanager.disk-health-checker.min-healthy-disks
>  0.25
>  
> 
>  yarn.nodemanager.health-checker.interval-ms
>  135000
>  
> 
>  yarn.nodemanager.health-checker.script.timeout-ms
>  6
>  
> 
>  
> yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage
>  false
>  
> 
>  yarn.nodemanager.linux-container-executor.group
>  hadoop
>  
> 
>  
> yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users
>  false
>  
> 
>  yarn.nodemanager.local-dirs
>  /hadoop/yarn/local
>  
> 
>  yarn.nodemanager.log-aggregation.compression-type
>  gz
>  
> 
>  yarn.nod

[jira] [Commented] (YARN-8179) Preemption does not happen due to natural_termination_factor when DRF is used

2018-05-21 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482855#comment-16482855
 ] 

Eric Payne commented on YARN-8179:
--

{quote}Can we please change the {{newInstance}} call to {{Resources.none()}}? 
This will accommodate extensible resources.
{quote}
Nope. Sorry. Forget about that previous comment. It looks like 
\{{Resource.newInstance(0, 0)}} already covers extensible resources.

 

I will commit this today or tomorrow.

 

> Preemption does not happen due to natural_termination_factor when DRF is used
> -
>
> Key: YARN-8179
> URL: https://issues.apache.org/jira/browse/YARN-8179
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: kyungwan nam
>Assignee: kyungwan nam
>Priority: Major
> Attachments: YARN-8179.001.patch, YARN-8179.002.patch, 
> YARN-8179.003.patch
>
>
> cluster
> * DominantResourceCalculator
> * QueueA : 50 (capacity) ~ 100 (max capacity)
> * QueueB : 50 (capacity) ~ 50 (max capacity)
> all of resources have been allocated to QueueA. (all Vcores are allocated to 
> QueueA)
> if App1 is submitted to QueueB, over-utilized QueueA should be preempted.
> but, I’ve met the problem, which preemption does not happen. it caused that 
> App1 AM can not allocated.
> when App1 is submitted, pending resources for asking App1 AM would be 
> 
> so, Vcores which need to be preempted from QueueB should be 1.
> but, it can be 0 due to natural_termination_factor (default is 0.2)
> we should guarantee that resources not to be 0 even though applying 
> natural_termination_factor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8179) Preemption does not happen due to natural_termination_factor when DRF is used

2018-05-21 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482841#comment-16482841
 ] 

Eric Payne edited comment on YARN-8179 at 5/21/18 6:22 PM:
---

[~kyungwan nam]-, I am really sorry for the long delay, and I'm also very sorry 
that I do have one more request, even though I had previously approved the 
latest patch.-
{code:java|title=PreemptableResourceCalculator#calculateResToObtainByPartitionForLeafQueues}
   if (Resources.greaterThan(rc, clusterResource, resToObtain,
 Resource.newInstance(0, 0)))
{code}
-Can we please change the {{newInstance}} call to {{Resources.none()}}? This 
will accommodate extensible resources.-


was (Author: eepayne):
[~kyungwan nam], I am really sorry for the long delay, and I'm also very sorry 
that I do have one more request, even though I had previously approved the 
latest patch.

{code:title=PreemptableResourceCalculator#calculateResToObtainByPartitionForLeafQueues}
   if (Resources.greaterThan(rc, clusterResource, resToObtain,
 Resource.newInstance(0, 0)))
{code}

Can we please change the {{newInstance}} call to {{Resources.none()}}? This 
will accommodate extensible resources.

> Preemption does not happen due to natural_termination_factor when DRF is used
> -
>
> Key: YARN-8179
> URL: https://issues.apache.org/jira/browse/YARN-8179
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: kyungwan nam
>Assignee: kyungwan nam
>Priority: Major
> Attachments: YARN-8179.001.patch, YARN-8179.002.patch, 
> YARN-8179.003.patch
>
>
> cluster
> * DominantResourceCalculator
> * QueueA : 50 (capacity) ~ 100 (max capacity)
> * QueueB : 50 (capacity) ~ 50 (max capacity)
> all of resources have been allocated to QueueA. (all Vcores are allocated to 
> QueueA)
> if App1 is submitted to QueueB, over-utilized QueueA should be preempted.
> but, I’ve met the problem, which preemption does not happen. it caused that 
> App1 AM can not allocated.
> when App1 is submitted, pending resources for asking App1 AM would be 
> 
> so, Vcores which need to be preempted from QueueB should be 1.
> but, it can be 0 due to natural_termination_factor (default is 0.2)
> we should guarantee that resources not to be 0 even though applying 
> natural_termination_factor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8179) Preemption does not happen due to natural_termination_factor when DRF is used

2018-05-21 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482841#comment-16482841
 ] 

Eric Payne commented on YARN-8179:
--

[~kyungwan nam], I am really sorry for the long delay, and I'm also very sorry 
that I do have one more request, even though I had previously approved the 
latest patch.

{code:title=PreemptableResourceCalculator#calculateResToObtainByPartitionForLeafQueues}
   if (Resources.greaterThan(rc, clusterResource, resToObtain,
 Resource.newInstance(0, 0)))
{code}

Can we please change the {{newInstance}} call to {{Resources.none()}}? This 
will accommodate extensible resources.

> Preemption does not happen due to natural_termination_factor when DRF is used
> -
>
> Key: YARN-8179
> URL: https://issues.apache.org/jira/browse/YARN-8179
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: kyungwan nam
>Assignee: kyungwan nam
>Priority: Major
> Attachments: YARN-8179.001.patch, YARN-8179.002.patch, 
> YARN-8179.003.patch
>
>
> cluster
> * DominantResourceCalculator
> * QueueA : 50 (capacity) ~ 100 (max capacity)
> * QueueB : 50 (capacity) ~ 50 (max capacity)
> all of resources have been allocated to QueueA. (all Vcores are allocated to 
> QueueA)
> if App1 is submitted to QueueB, over-utilized QueueA should be preempted.
> but, I’ve met the problem, which preemption does not happen. it caused that 
> App1 AM can not allocated.
> when App1 is submitted, pending resources for asking App1 AM would be 
> 
> so, Vcores which need to be preempted from QueueB should be 1.
> but, it can be 0 due to natural_termination_factor (default is 0.2)
> we should guarantee that resources not to be 0 even though applying 
> natural_termination_factor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers

2018-05-21 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482809#comment-16482809
 ] 

genericqa commented on YARN-8206:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
28s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 43s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 41s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 
56s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 71m 19s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8206 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12924373/YARN-8206.011.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux bbcb780999a2 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 
19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / f48fec8 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20811/testReport/ |
| Max. process+thread count | 435 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20811/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Sending a kill does not immediately kill docker

[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-05-21 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482729#comment-16482729
 ] 

genericqa commented on YARN-8259:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
35s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
 8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 10s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 27s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 
32s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 76m 18s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8259 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12924356/YARN-8259.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux d3ca0d4182cb 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / f48fec8 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20808/testReport/ |
| Max. process+thread count | 301 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20808/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Revisit liveliness checks for Docker container

[jira] [Commented] (YARN-8327) Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows

2018-05-21 Thread Anbang Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482706#comment-16482706
 ] 

Anbang Hu commented on YARN-8327:
-

Thanks [~giovanni.fumarola] for providing more information. Would you consider 
changing
{code:java}
++ numChars + ("\n").length() + (System.lineSeparator()
++ "End of LogType:stdout" + System.lineSeparator()).length();
{code}
to
{code:java}
++ numChars + ("\n").length() + ("End of LogType:stdout"
++ System.lineSeparator() + System.lineSeparator()).length();
{code}
This seems more aligned with the output message.

> Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows
> --
>
> Key: YARN-8327
> URL: https://issues.apache.org/jira/browse/YARN-8327
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8327.v1.patch, image-2018-05-18-16-52-08-250.png, 
> image-2018-05-21-09-05-49-550.png
>
>
> TestAggregatedLogFormat#testReadAcontainerLogs1 fails on Windows because of 
> the line separator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs

2018-05-21 Thread Yiran Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiran Wu updated YARN-8041:
---
Attachment: YARN-8041.004.patch

> [Router] Federation: routing some missing REST invocations transparently to 
> multiple RMs
> 
>
> Key: YARN-8041
> URL: https://issues.apache.org/jira/browse/YARN-8041
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, router
>Reporter: Yiran Wu
>Assignee: Yiran Wu
>Priority: Minor
> Attachments: YARN-8041.001.patch, YARN-8041.002.patch, 
> YARN-8041.003.patch, YARN-8041.004.patch
>
>
> This Jira tracks the implementation of some missing REST invocation in 
> FederationInterceptorREST:
>  * getAppStatistics
>  * getNodeToLabels
>  * getLabelsOnNode
>  * updateApplicationPriority
>  * getAppQueue
>  * updateAppQueue
>  * getAppTimeout
>  * getAppTimeouts
>  * updateApplicationTimeout
>  * getAppAttempts
>  * getAppAttempt
>  * getContainers
>  * getContainer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers

2018-05-21 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482694#comment-16482694
 ] 

Eric Badger commented on YARN-8206:
---

Thanks for the review, [~jlowe]! Patch 011 addresses your comments. At this 
point {{isContainerRequestedAsPrivileged()}} is a really simple gadget function 
that might not be necessary. Makes it easier to change the implementation of 
the function, though. If you'd like me to just inline this in the caller code, 
I can do that.

> Sending a kill does not immediately kill docker containers
> --
>
> Key: YARN-8206
> URL: https://issues.apache.org/jira/browse/YARN-8206
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8206.001.patch, YARN-8206.002.patch, 
> YARN-8206.003.patch, YARN-8206.004.patch, YARN-8206.005.patch, 
> YARN-8206.006.patch, YARN-8206.007.patch, YARN-8206.008.patch, 
> YARN-8206.009.patch, YARN-8206.010.patch, YARN-8206.011.patch
>
>
> {noformat}
> if (ContainerExecutor.Signal.KILL.equals(signal)
> || ContainerExecutor.Signal.TERM.equals(signal)) {
>   handleContainerStop(containerId, env);
> {noformat}
> Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent 
> for docker containers. However, they should actually be separate. When YARN 
> sends a SIGKILL to a process, it means for it to die immediately and not sit 
> around waiting for anything. This ensures an immediate reclamation of 
> resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task 
> might not handle the signal correctly, and will then end up as a failed task 
> instead of a killed task. This is especially bad for preemption. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8256) Pluggable provider for node membership management

2018-05-21 Thread Dagang Wei (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482695#comment-16482695
 ] 

Dagang Wei commented on YARN-8256:
--

Friendly ping.

> Pluggable provider for node membership management
> -
>
> Key: YARN-8256
> URL: https://issues.apache.org/jira/browse/YARN-8256
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 2.8.3, 3.0.2
>Reporter: Dagang Wei
>Priority: Major
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> h1. Background
> HDFS-7541 introduced a pluggable provider framework for node membership 
> management, which gives HDFS the flexibility to have different ways to manage 
> node membership for different needs.
> [org.apache.hadoop.hdfs.server.blockmanagement.HostConfigManager|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HostConfigManager.java]
>  is the class which provides the abstraction. Currently, there are 2 
> implementations in the HDFS codebase:
> 1) 
> [org.apache.hadoop.hdfs.server.blockmanagement.HostFileManager|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HostFileManager.java]
>  which uses 2 config files which are defined by the properties dfs.hosts and 
> dfs.hosts.exclude.
> 2) 
> [org.apache.hadoop.hdfs.server.blockmanagement.CombinedHostFileManager|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CombinedHostFileManager.java]
>  which uses a single JSON file defined by the property dfs.hosts.
> dfs.namenode.hosts.provider.classname is the property determining which 
> implementation is used
> h1. Problem
> YARN should be consistent with HDFS in terms of pluggable provider for node 
> membership management. The absence of it makes YARN impossible to have other 
> config sources, e.g., ZooKeeper, database, other config file formats, etc.
> h1. Proposed solution
> [org.apache.hadoop.yarn.server.resourcemanager.NodesListManager|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManager.java]
>  is the class for managing YARN node membership today. It uses 
> [HostsFileReader|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/HostsFileReader.java]
>  to read config files specified by the property 
> yarn.resourcemanager.nodes.include-path for nodes to include and 
> yarn.resourcemanager.nodes.nodes.exclude-path for nodes to exclude.
> The proposed solution is to
> 1) introduce a new interface {color:#008000}HostsConfigManager{color} which 
> provides the abstraction for node membership management. Update 
> {color:#008000}NodeListManager{color} to depend on 
> {color:#008000}HostsConfigManager{color} instead of 
> {color:#008000}HostsFileReader{color}. Then create a wrapper class for 
> {color:#008000}HostsFileReader{color} which implements the interface.
> 2) introduce a new config property 
> {color:#008000}yarn.resourcemanager.hosts-config.manager.class{color} for 
> specifying the implementation class. Set the default value to the wrapper 
> class of {color:#008000}HostsFileReader{color} for backward compatibility 
> between new code and old config.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8206) Sending a kill does not immediately kill docker containers

2018-05-21 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-8206:
--
Attachment: YARN-8206.011.patch

> Sending a kill does not immediately kill docker containers
> --
>
> Key: YARN-8206
> URL: https://issues.apache.org/jira/browse/YARN-8206
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8206.001.patch, YARN-8206.002.patch, 
> YARN-8206.003.patch, YARN-8206.004.patch, YARN-8206.005.patch, 
> YARN-8206.006.patch, YARN-8206.007.patch, YARN-8206.008.patch, 
> YARN-8206.009.patch, YARN-8206.010.patch, YARN-8206.011.patch
>
>
> {noformat}
> if (ContainerExecutor.Signal.KILL.equals(signal)
> || ContainerExecutor.Signal.TERM.equals(signal)) {
>   handleContainerStop(containerId, env);
> {noformat}
> Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent 
> for docker containers. However, they should actually be separate. When YARN 
> sends a SIGKILL to a process, it means for it to die immediately and not sit 
> around waiting for anything. This ensures an immediate reclamation of 
> resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task 
> might not handle the signal correctly, and will then end up as a failed task 
> instead of a killed task. This is especially bad for preemption. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8333) Load balance YARN services using RegistryDNS multiple A records

2018-05-21 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-8333:

Description: 
For scaling stateless containers, it would be great to support DNS round robin 
for fault tolerance and load balancing.  The current DNS record format for 
RegistryDNS is [container-instance].[application-name].[username].[domain].  
For example:

{code}
appcatalog-0.appname.hbase.ycluster. IN A 123.123.123.120
appcatalog-1.appname.hbase.ycluster. IN A 123.123.123.121
appcatalog-2.appname.hbase.ycluster. IN A 123.123.123.122
appcatalog-3.appname.hbase.ycluster. IN A 123.123.123.123
{code}

It would be nice to add multi-A record that contains all IP addresses of the 
same component in addition to the instance based records.  For example:

{code}
appcatalog.appname.hbase.ycluster. IN A 123.123.123.120
appcatalog.appname.hbase.ycluster. IN A 123.123.123.121
appcatalog.appname.hbase.ycluster. IN A 123.123.123.122
appcatalog.appname.hbase.ycluster. IN A 123.123.123.123
{code}


> Load balance YARN services using RegistryDNS multiple A records
> ---
>
> Key: YARN-8333
> URL: https://issues.apache.org/jira/browse/YARN-8333
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Priority: Major
>
> For scaling stateless containers, it would be great to support DNS round 
> robin for fault tolerance and load balancing.  The current DNS record format 
> for RegistryDNS is 
> [container-instance].[application-name].[username].[domain].  For example:
> {code}
> appcatalog-0.appname.hbase.ycluster. IN A 123.123.123.120
> appcatalog-1.appname.hbase.ycluster. IN A 123.123.123.121
> appcatalog-2.appname.hbase.ycluster. IN A 123.123.123.122
> appcatalog-3.appname.hbase.ycluster. IN A 123.123.123.123
> {code}
> It would be nice to add multi-A record that contains all IP addresses of the 
> same component in addition to the instance based records.  For example:
> {code}
> appcatalog.appname.hbase.ycluster. IN A 123.123.123.120
> appcatalog.appname.hbase.ycluster. IN A 123.123.123.121
> appcatalog.appname.hbase.ycluster. IN A 123.123.123.122
> appcatalog.appname.hbase.ycluster. IN A 123.123.123.123
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-21 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482689#comment-16482689
 ] 

Eric Payne commented on YARN-8292:
--

Thanks [~leftnoteasy] for your work on this issue.

- I don't think this is necessary.
{code:title=AbstractPreemptableResourceCalculator#computeFixpointAllocation}
  Resource dupUnassignedForTheRound = Resources.clone(unassigned);
{code}


- I'm concerned about checking for {{any resource <= 0}} before preempting for 
intra-queue preemption. When extended resources are used, won't this prevent 
any preemption in a queue where none of the apps used the extended resource?
{code:title=CapacitySchedulerPreemptionUtils#tryPreemptContainerAndDeductResToObtain}
  if (conservativeDRF) {
doPreempt = !Resources.isAnyMajorResourceZeroOrNegative(rc,
toObtainByPartition);
  } else{
{code}
For example, if gpu is the extended resource, but no apps are currently using 
gpu in the queue, no intra-queue preemption will take place.


> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8292.001.patch, YARN-8292.002.patch, 
> YARN-8292.003.patch, YARN-8292.004.patch, YARN-8292.005.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8333) Load balance YARN services using RegistryDNS multiple A records

2018-05-21 Thread Eric Yang (JIRA)
Eric Yang created YARN-8333:
---

 Summary: Load balance YARN services using RegistryDNS multiple A 
records
 Key: YARN-8333
 URL: https://issues.apache.org/jira/browse/YARN-8333
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn-native-services
Affects Versions: 3.1.0
Reporter: Eric Yang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8327) Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows

2018-05-21 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482664#comment-16482664
 ] 

Giovanni Matteo Fumarola commented on YARN-8327:


Thanks [~huanbang1993] for the feedback. I copied the wrong picture.

!image-2018-05-21-09-05-49-550.png!

In Linux I can see:
xx*\n*End of LogType:stdout*\n\n*

My initial statemant about \r\n\n was incorrect. The problem is about the \n 
before the "End of LogType". 

 

> Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows
> --
>
> Key: YARN-8327
> URL: https://issues.apache.org/jira/browse/YARN-8327
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8327.v1.patch, image-2018-05-18-16-52-08-250.png, 
> image-2018-05-21-09-05-49-550.png
>
>
> TestAggregatedLogFormat#testReadAcontainerLogs1 fails on Windows because of 
> the line separator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8327) Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows

2018-05-21 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8327:
---
Attachment: image-2018-05-21-09-05-49-550.png

> Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows
> --
>
> Key: YARN-8327
> URL: https://issues.apache.org/jira/browse/YARN-8327
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8327.v1.patch, image-2018-05-18-16-52-08-250.png, 
> image-2018-05-21-09-05-49-550.png
>
>
> TestAggregatedLogFormat#testReadAcontainerLogs1 fails on Windows because of 
> the line separator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8327) Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows

2018-05-21 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8327:
---
Attachment: (was: TestAggregatedLogFormat.png)

> Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows
> --
>
> Key: YARN-8327
> URL: https://issues.apache.org/jira/browse/YARN-8327
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8327.v1.patch, image-2018-05-18-16-52-08-250.png
>
>
> TestAggregatedLogFormat#testReadAcontainerLogs1 fails on Windows because of 
> the line separator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8327) Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows

2018-05-21 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16481367#comment-16481367
 ] 

Giovanni Matteo Fumarola edited comment on YARN-8327 at 5/21/18 4:03 PM:
-

Before the patch:

!TestAggregatedLogFormat.png!

After the patch:

!image-2018-05-18-16-52-08-250.png!

I validated in Linux as well.

The length of System.lineSeparator() in Windows is equal to 2 while in Linux is 
1.
 In the logline, there are 2 consecutive line separators that in Windows are 
converted to \r\n\n. Due to this, I changed every *\n* but one.


was (Author: giovanni.fumarola):
Before the patch:

!file:///C:/Users/gifuma/AppData/Local/Temp/msohtmlclip1/01/clip_image001.png|width=486,height=111!

After the patch:

!image-2018-05-18-16-52-08-250.png!

I validated in Linux as well.

The length of System.lineSeparator() in Windows is equal to 2 while in Linux is 
1.
 In the logline, there are 2 consecutive line separators that in Windows are 
converted to \r\n\n. Due to this, I changed every *\n* but one.

> Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows
> --
>
> Key: YARN-8327
> URL: https://issues.apache.org/jira/browse/YARN-8327
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: TestAggregatedLogFormat.png, YARN-8327.v1.patch, 
> image-2018-05-18-16-52-08-250.png
>
>
> TestAggregatedLogFormat#testReadAcontainerLogs1 fails on Windows because of 
> the line separator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8327) Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows

2018-05-21 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8327:
---
Attachment: TestAggregatedLogFormat.png

> Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows
> --
>
> Key: YARN-8327
> URL: https://issues.apache.org/jira/browse/YARN-8327
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: TestAggregatedLogFormat.png, YARN-8327.v1.patch, 
> image-2018-05-18-16-52-08-250.png
>
>
> TestAggregatedLogFormat#testReadAcontainerLogs1 fails on Windows because of 
> the line separator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8327) Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows

2018-05-21 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16481367#comment-16481367
 ] 

Giovanni Matteo Fumarola edited comment on YARN-8327 at 5/21/18 4:02 PM:
-

Before the patch:

!file:///C:/Users/gifuma/AppData/Local/Temp/msohtmlclip1/01/clip_image001.png|width=486,height=111!

After the patch:

!image-2018-05-18-16-52-08-250.png!

I validated in Linux as well.

The length of System.lineSeparator() in Windows is equal to 2 while in Linux is 
1.
 In the logline, there are 2 consecutive line separators that in Windows are 
converted to \r\n\n. Due to this, I changed every *\n* but one.


was (Author: giovanni.fumarola):
Before the patch:

!image-2018-05-18-16-51-47-324.png!

After the patch:

!image-2018-05-18-16-52-08-250.png!

I validated in Linux as well.

The length of System.lineSeparator() in Windows is equal to 2 while in Linux is 
1.
In the logline, there are 2 consecutive line separators that in Windows are 
converted to \r\n\n. Due to this, I changed every *\n* but one.

> Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows
> --
>
> Key: YARN-8327
> URL: https://issues.apache.org/jira/browse/YARN-8327
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8327.v1.patch, image-2018-05-18-16-52-08-250.png
>
>
> TestAggregatedLogFormat#testReadAcontainerLogs1 fails on Windows because of 
> the line separator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8327) Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows

2018-05-21 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8327:
---
Attachment: (was: image-2018-05-18-16-51-47-324.png)

> Fix TestAggregatedLogFormat#testReadAcontainerLogs1 on Windows
> --
>
> Key: YARN-8327
> URL: https://issues.apache.org/jira/browse/YARN-8327
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8327.v1.patch, image-2018-05-18-16-52-08-250.png
>
>
> TestAggregatedLogFormat#testReadAcontainerLogs1 fails on Windows because of 
> the line separator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-4353) Provide short circuit user group mapping for NM/AM

2018-05-21 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton resolved YARN-4353.

  Resolution: Won't Fix
Hadoop Flags: Reviewed

I'm fine with closing this out.  I added {{NullGroupsMapping}} in order to 
resolve this JIRA, but I never felt confident enough to pull the trigger.

> Provide short circuit user group mapping for NM/AM
> --
>
> Key: YARN-4353
> URL: https://issues.apache.org/jira/browse/YARN-4353
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: YARN-4353.prelim.patch
>
>
> When the NM launches an AM, the {{ContainerLocalizer}} gets the current user 
> from {{UserGroupInformation}}, which triggers user group mapping, even though 
> the user groups are never accessed.  If secure LDAP is configured for group 
> mapping, then there are some additional complications created by the 
> unnecessary group resolution.  Additionally, it adds unnecessary latency to 
> the container launch time.
> To address the issue, before getting the current user, the 
> {{ContainerLocalizer}} should configure {{UserGroupInformation}} with a null 
> group mapping service that quickly and quietly returns an empty group list 
> for all users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8328) NonAggregatingLogHandler needlessly waits upon shutdown if delayed deletion is scheduled

2018-05-21 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482643#comment-16482643
 ] 

Jason Lowe commented on YARN-8328:
--

This is making a lot of unit tests take a lot longer than they should.  
NonAggregatingLogHandler is keeping applications alive as long as the logs are 
around, and the NM tries to wait for applications to finish cleaning up when it 
tears down.  That leads to long delays in unit tests that use the minicluster.

My initial reaction was the minicluster should automatically set the log 
deletion to 0, but I can see the desire to keep container logs around for test 
debugging.  What if there was a way to tell the non-aggregating log handler to 
_not_ delete logs?  Applications would then complete quickly rather than linger 
as they are doing today.  I propose the minicluster ask for this by default, as 
IMHO it's the responsibility of the unit test running the minicluster to 
perform cleanup of the test directories (including the minicluster local and 
log dirs), and that would leave the option to the minicluster user whether to 
keep logs for debugging or clean them up with all the other test data.  
Thoughts?


> NonAggregatingLogHandler needlessly waits upon shutdown if delayed deletion 
> is scheduled
> 
>
> Key: YARN-8328
> URL: https://issues.apache.org/jira/browse/YARN-8328
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Priority: Major
> Attachments: YARN-8328.001.patch
>
>
> This happens frequently in the MiniYarnCluster setup where a job completes 
> and then the cluster is shutdown. Often the jobs are scheduled to delay 
> deletion of the log files so that container logs are available for debugging. 
> A scheduled log deletion has a default value of 3 hours. When the 
> NonAggregating LogHandler is stopped it waits 10 seconds for all delayed 
> scheduled log deletions to occur and then shuts down.
> A test developer has to make a trade of to either 1) set the log deletion to 
> 0 to have the tests quickly shutdown but get no container logs to debug or 2) 
> keep the log deletion at 3 hours and incur the 10 second timeout for each 
> test suit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8259) Revisit liveliness checks for privileged Docker containers

2018-05-21 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482640#comment-16482640
 ] 

Shane Kumpf commented on YARN-8259:
---

Attaching a patch that addresses the issue.

Unfortunately, I found a couple of problems with using the docker API/CLI that 
made it necessary to go a different direction. The liveliness check will fail 
during a docker daemon restart with live restore enabled. We would need to add 
retries to handle this case, but this is brittle. Also, the reacquistion code 
runs {{signalContainer}} once per second until the application finishes, this 
resulted in many {{docker inspect}} and {{container-executor}} calls, which are 
expensive operations.

After testing the suggested approaches, I went with checking for the existence 
of /proc/. The obvious con of using /proc is portability, but I'm not sure 
this is really an issue given existing features of the docker runtime. The 
patch moves both privileged and non-privileged Docker container liveliness 
checks to use /proc/, saving the c-e call.

Finally, the logging around signalling failures needs work, but this impacts 
all runtimes under LCE, so I'd prefer to address it in a separate issue. I'll 
open an issue on this.

> Revisit liveliness checks for privileged Docker containers
> --
>
> Key: YARN-8259
> URL: https://issues.apache.org/jira/browse/YARN-8259
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8259.001.patch
>
>
> As privileged containers may execute as a user that does not match the YARN 
> run as user, sending the null signal for liveliness checks could fail. We 
> need to reconsider how liveliness checks are handled in the Docker case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8259) Revisit liveliness checks for Docker containers

2018-05-21 Thread Shane Kumpf (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf updated YARN-8259:
--
Summary: Revisit liveliness checks for Docker containers  (was: Revisit 
liveliness checks for privileged Docker containers)

> Revisit liveliness checks for Docker containers
> ---
>
> Key: YARN-8259
> URL: https://issues.apache.org/jira/browse/YARN-8259
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8259.001.patch
>
>
> As privileged containers may execute as a user that does not match the YARN 
> run as user, sending the null signal for liveliness checks could fail. We 
> need to reconsider how liveliness checks are handled in the Docker case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8259) Revisit liveliness checks for privileged Docker containers

2018-05-21 Thread Shane Kumpf (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf updated YARN-8259:
--
Attachment: YARN-8259.001.patch

> Revisit liveliness checks for privileged Docker containers
> --
>
> Key: YARN-8259
> URL: https://issues.apache.org/jira/browse/YARN-8259
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8259.001.patch
>
>
> As privileged containers may execute as a user that does not match the YARN 
> run as user, sending the null signal for liveliness checks could fail. We 
> need to reconsider how liveliness checks are handled in the Docker case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8248) Job hangs when a job requests a resource that its queue does not have

2018-05-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482627#comment-16482627
 ] 

Hudson commented on YARN-8248:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14244 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14244/])
YARN-8248. Job hangs when a job requests a resource that its queue does 
(haibochen: rev f48fec83d0f2d1a781a141ad7216463c5526321f)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java


> Job hangs when a job requests a resource that its queue does not have
> -
>
> Key: YARN-8248
> URL: https://issues.apache.org/jira/browse/YARN-8248
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8248-001.patch, YARN-8248-002.patch, 
> YARN-8248-003.patch, YARN-8248-004.patch, YARN-8248-005.patch, 
> YARN-8248-006.patch, YARN-8248-007.patch, YARN-8248-008.patch, 
> YARN-8248-009.patch, YARN-8248-010.patch, YARN-8248-011.patch, 
> YARN-8248-012.patch, YARN-8248-013.patch, YARN-8248-014.patch
>
>
> Job hangs when mapreduce.job.queuename is specified and the queue has 0 of 
> any resource (vcores / memory / other)
> In this scenario, the job should be immediately rejected upon submission 
> since the specified queue cannot serve the resource needs of the submitted 
> job.
>  
> Command to run:
> {code:java}
> bin/yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" 
> pi -Dmapreduce.job.queuename=sample_queue 1 1000;{code}
> fair-scheduler.xml queue config (excerpt):
>  
> {code:java}
>  
> 1 mb,0vcores
> 9 mb,0vcores
> 50
> -1.0f
> 2.0
> fair
>   
> {code}
> Diagnostic message from the web UI: 
> {code:java}
> Wed May 02 06:35:57 -0700 2018] Application is added to the scheduler and is 
> not yet activated. (Resource request:  exceeds current 
> queue or its parents maximum resource allowed).{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers

2018-05-21 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482584#comment-16482584
 ] 

Jason Lowe commented on YARN-8206:
--

Thanks for updating the patch!  Unfortunately the patch no longer applies after 
YARN-8141 and needs to be updated.

allowPrivilegedContainerExecution will now log a warning every time a 
privileged container is not requested which it did not do before.  I'm 
wondering if a warning log is really ever needed here.  It's already logging 
when a privileged container is requested, so we can infer if that log does not 
appear that the env variable was not set properly.

Nit: isContainerRequestedAsPrivileged can be simplified with 
Boolean.parseBoolean which handles null directly.

Nit: Since handleContainerKill is being refactored it would be good to improve 
the debug logging to leverage the SLF4J API so it doesn't have to do a debug 
check, i.e.: leveraging the \{\} positional parameter syntax so the message 
does not have to be built even if the log message will be discarded.


> Sending a kill does not immediately kill docker containers
> --
>
> Key: YARN-8206
> URL: https://issues.apache.org/jira/browse/YARN-8206
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8206.001.patch, YARN-8206.002.patch, 
> YARN-8206.003.patch, YARN-8206.004.patch, YARN-8206.005.patch, 
> YARN-8206.006.patch, YARN-8206.007.patch, YARN-8206.008.patch, 
> YARN-8206.009.patch, YARN-8206.010.patch
>
>
> {noformat}
> if (ContainerExecutor.Signal.KILL.equals(signal)
> || ContainerExecutor.Signal.TERM.equals(signal)) {
>   handleContainerStop(containerId, env);
> {noformat}
> Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent 
> for docker containers. However, they should actually be separate. When YARN 
> sends a SIGKILL to a process, it means for it to die immediately and not sit 
> around waiting for anything. This ensures an immediate reclamation of 
> resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task 
> might not handle the signal correctly, and will then end up as a failed task 
> instead of a killed task. This is especially bad for preemption. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8248) Job hangs when a job requests a resource that its queue does not have

2018-05-21 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482576#comment-16482576
 ] 

Haibo Chen commented on YARN-8248:
--

I'm okay with not fixing this one. But it is indeed a convention to cap static 
variables, IIRC.

+1 checking it in.

> Job hangs when a job requests a resource that its queue does not have
> -
>
> Key: YARN-8248
> URL: https://issues.apache.org/jira/browse/YARN-8248
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8248-001.patch, YARN-8248-002.patch, 
> YARN-8248-003.patch, YARN-8248-004.patch, YARN-8248-005.patch, 
> YARN-8248-006.patch, YARN-8248-007.patch, YARN-8248-008.patch, 
> YARN-8248-009.patch, YARN-8248-010.patch, YARN-8248-011.patch, 
> YARN-8248-012.patch, YARN-8248-013.patch, YARN-8248-014.patch
>
>
> Job hangs when mapreduce.job.queuename is specified and the queue has 0 of 
> any resource (vcores / memory / other)
> In this scenario, the job should be immediately rejected upon submission 
> since the specified queue cannot serve the resource needs of the submitted 
> job.
>  
> Command to run:
> {code:java}
> bin/yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" 
> pi -Dmapreduce.job.queuename=sample_queue 1 1000;{code}
> fair-scheduler.xml queue config (excerpt):
>  
> {code:java}
>  
> 1 mb,0vcores
> 9 mb,0vcores
> 50
> -1.0f
> 2.0
> fair
>   
> {code}
> Diagnostic message from the web UI: 
> {code:java}
> Wed May 02 06:35:57 -0700 2018] Application is added to the scheduler and is 
> not yet activated. (Resource request:  exceeds current 
> queue or its parents maximum resource allowed).{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7960) Add no-new-privileges flag to docker run

2018-05-21 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482548#comment-16482548
 ] 

Eric Badger commented on YARN-7960:
---

Test doesn't fail for me locally and is in RM code, so it's unrelated.

> Add no-new-privileges flag to docker run
> 
>
> Key: YARN-7960
> URL: https://issues.apache.org/jira/browse/YARN-7960
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
> Attachments: YARN-7960.001.patch, YARN-7960.002.patch
>
>
> Minimally, this should be used for unprivileged containers. It's a cheap way 
> to add an extra layer of security to the docker model. For privileged 
> containers, it might be appropriate to omit this flag
> https://github.com/moby/moby/pull/20727



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8103) Add CLI interface to query node attributes

2018-05-21 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482526#comment-16482526
 ] 

genericqa commented on YARN-8103:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
35s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} YARN-3409 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
49s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
29s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 
31s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
34s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  8m 
11s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 43s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
43s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m  
0s{color} | {color:green} YARN-3409 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 28m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 28m  
4s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m 32s{color} | {color:orange} root: The patch generated 13 new + 484 unchanged 
- 27 fixed = 497 total (was 511) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  8m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
25s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
15s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 2 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 30s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
53s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m  
0s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  7m 25s{color} 
| {color:red} hadoop-yarn in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
53s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  3m 18s{color} 
| {color:red} hadoop-yarn-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} uni

[jira] [Commented] (YARN-8248) Job hangs when a job requests a resource that its queue does not have

2018-05-21 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482427#comment-16482427
 ] 

genericqa commented on YARN-8248:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
32s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 54s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 38s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 257 unchanged - 1 fixed = 258 total (was 258) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 50s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 68m 
28s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}128m 26s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8248 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12924323/YARN-8248-014.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux b9cdbcec529c 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / a23ff8d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/20807/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20807/testReport/ |
| Max. process+thread count | 819 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resou

[jira] [Updated] (YARN-8320) Add support CPU isolation for latency-sensitive (LS) service

2018-05-21 Thread Jiandan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiandan Yang  updated YARN-8320:

Attachment: YARN-8320.001.patch

> Add support CPU isolation for latency-sensitive  (LS) service
> -
>
> Key: YARN-8320
> URL: https://issues.apache.org/jira/browse/YARN-8320
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, 
> YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares” to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
>  * Request latency of services running on container may be frequent shake 
> when all containers share cpus, and latency-sensitive services can not afford 
> in our production environment.
> So we need more finer cpu isolation.
> My co-workers and I propose a solution using cgroup cpuset to binds 
> containers to different processors, this is inspired by the isolation 
> technique in [Borg 
> system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].
>  Later I will upload a detailed design doc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8320) Add support CPU isolation for latency-sensitive (LS) service

2018-05-21 Thread Jiandan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiandan Yang  updated YARN-8320:

Attachment: (was: YARN-8320.001.patch)

> Add support CPU isolation for latency-sensitive  (LS) service
> -
>
> Key: YARN-8320
> URL: https://issues.apache.org/jira/browse/YARN-8320
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, 
> YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares” to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
>  * Request latency of services running on container may be frequent shake 
> when all containers share cpus, and latency-sensitive services can not afford 
> in our production environment.
> So we need more finer cpu isolation.
> My co-workers and I propose a solution using cgroup cpuset to binds 
> containers to different processors, this is inspired by the isolation 
> technique in [Borg 
> system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].
>  Later I will upload a detailed design doc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8320) Add support CPU isolation for latency-sensitive (LS) service

2018-05-21 Thread Jiandan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482336#comment-16482336
 ] 

Jiandan Yang  commented on YARN-8320:
-

upload v1 patch to initiate disscussion

> Add support CPU isolation for latency-sensitive  (LS) service
> -
>
> Key: YARN-8320
> URL: https://issues.apache.org/jira/browse/YARN-8320
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, 
> YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares” to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
>  * Request latency of services running on container may be frequent shake 
> when all containers share cpus, and latency-sensitive services can not afford 
> in our production environment.
> So we need more finer cpu isolation.
> My co-workers and I propose a solution using cgroup cpuset to binds 
> containers to different processors, this is inspired by the isolation 
> technique in [Borg 
> system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].
>  Later I will upload a detailed design doc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8320) Add support CPU isolation for latency-sensitive (LS) service

2018-05-21 Thread Jiandan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiandan Yang  updated YARN-8320:

Attachment: YARN-8320.001.patch

> Add support CPU isolation for latency-sensitive  (LS) service
> -
>
> Key: YARN-8320
> URL: https://issues.apache.org/jira/browse/YARN-8320
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, 
> YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares” to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
>  * Request latency of services running on container may be frequent shake 
> when all containers share cpus, and latency-sensitive services can not afford 
> in our production environment.
> So we need more finer cpu isolation.
> My co-workers and I propose a solution using cgroup cpuset to binds 
> containers to different processors, this is inspired by the isolation 
> technique in [Borg 
> system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].
>  Later I will upload a detailed design doc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8248) Job hangs when a job requests a resource that its queue does not have

2018-05-21 Thread Szilard Nemeth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-8248:
-
Attachment: YARN-8248-014.patch

> Job hangs when a job requests a resource that its queue does not have
> -
>
> Key: YARN-8248
> URL: https://issues.apache.org/jira/browse/YARN-8248
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8248-001.patch, YARN-8248-002.patch, 
> YARN-8248-003.patch, YARN-8248-004.patch, YARN-8248-005.patch, 
> YARN-8248-006.patch, YARN-8248-007.patch, YARN-8248-008.patch, 
> YARN-8248-009.patch, YARN-8248-010.patch, YARN-8248-011.patch, 
> YARN-8248-012.patch, YARN-8248-013.patch, YARN-8248-014.patch
>
>
> Job hangs when mapreduce.job.queuename is specified and the queue has 0 of 
> any resource (vcores / memory / other)
> In this scenario, the job should be immediately rejected upon submission 
> since the specified queue cannot serve the resource needs of the submitted 
> job.
>  
> Command to run:
> {code:java}
> bin/yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" 
> pi -Dmapreduce.job.queuename=sample_queue 1 1000;{code}
> fair-scheduler.xml queue config (excerpt):
>  
> {code:java}
>  
> 1 mb,0vcores
> 9 mb,0vcores
> 50
> -1.0f
> 2.0
> fair
>   
> {code}
> Diagnostic message from the web UI: 
> {code:java}
> Wed May 02 06:35:57 -0700 2018] Application is added to the scheduler and is 
> not yet activated. (Resource request:  exceeds current 
> queue or its parents maximum resource allowed).{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8248) Job hangs when a job requests a resource that its queue does not have

2018-05-21 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482328#comment-16482328
 ] 

genericqa commented on YARN-8248:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
35s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 38s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 35s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 2 new + 257 unchanged - 1 fixed = 259 total (was 258) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 32s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 67m 
30s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}125m 45s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8248 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12924314/YARN-8248-013.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux ea2c3aae2b4c 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / a23ff8d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/20804/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20804/testReport/ |
| Max. process+thread count | 870 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resou

[jira] [Updated] (YARN-8103) Add CLI interface to query node attributes

2018-05-21 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-8103:
---
Attachment: YARN-8103-YARN-3409.002.patch

> Add CLI interface to  query node attributes
> ---
>
> Key: YARN-8103
> URL: https://issues.apache.org/jira/browse/YARN-8103
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
> Attachments: YARN-8103-YARN-3409.001.patch, 
> YARN-8103-YARN-3409.002.patch, YARN-8103-YARN-3409.WIP.patch
>
>
> YARN-8100 will add API interface for querying the attributes. CLI interface 
> for querying node attributes for each nodes and list all attributes in 
> cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   >