[jira] [Commented] (YARN-4434) NodeManager Disk Checker parameter documentation is not correct

2015-12-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048225#comment-15048225
 ] 

ASF GitHub Bot commented on YARN-4434:
--

Github user aajisaka commented on the pull request:

https://github.com/apache/hadoop/pull/62#issuecomment-163142562
  
Thank you for the pull request. I reviewed the patch (A) and the another 
patch in YARN-4434 jira (B) and decided to commit the patch (B) because the 
patch (B) replaces "i.e. the entire disk" with "i.e. 90% of the disk" as well.


> NodeManager Disk Checker parameter documentation is not correct
> ---
>
> Key: YARN-4434
> URL: https://issues.apache.org/jira/browse/YARN-4434
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, nodemanager
>Affects Versions: 2.6.0, 2.7.1
>Reporter: Takashi Ohnishi
>Assignee: Weiwei Yang
>Priority: Minor
> Fix For: 2.8.0, 2.6.3, 2.7.3
>
> Attachments: YARN-4434.001.patch, YARN-4434.branch-2.6.patch
>
>
> In the description of 
> yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage,
>  it says
> {noformat}
> The default value is 100 i.e. the entire disk can be used.
> {noformat}
> But, in yarn-default.xml and source code, the default value is 90.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4434) NodeManager Disk Checker parameter documentation is not correct

2015-12-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048226#comment-15048226
 ] 

ASF GitHub Bot commented on YARN-4434:
--

Github user aajisaka commented on the pull request:

https://github.com/apache/hadoop/pull/62#issuecomment-163142634
  
I've committed the patch (B), so would you close this pull request?


> NodeManager Disk Checker parameter documentation is not correct
> ---
>
> Key: YARN-4434
> URL: https://issues.apache.org/jira/browse/YARN-4434
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, nodemanager
>Affects Versions: 2.6.0, 2.7.1
>Reporter: Takashi Ohnishi
>Assignee: Weiwei Yang
>Priority: Minor
> Fix For: 2.8.0, 2.6.3, 2.7.3
>
> Attachments: YARN-4434.001.patch, YARN-4434.branch-2.6.patch
>
>
> In the description of 
> yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage,
>  it says
> {noformat}
> The default value is 100 i.e. the entire disk can be used.
> {noformat}
> But, in yarn-default.xml and source code, the default value is 90.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4301) NM disk health checker should have a timeout

2015-12-08 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4301:
-
Assignee: Akihiro Suda

> NM disk health checker should have a timeout
> 
>
> Key: YARN-4301
> URL: https://issues.apache.org/jira/browse/YARN-4301
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akihiro Suda
>Assignee: Akihiro Suda
> Attachments: YARN-4301-1.patch, YARN-4301-2.patch, 
> concept-async-diskchecker.txt
>
>
> The disk health checker [verifies a 
> disk|https://github.com/apache/hadoop/blob/96677bef00b03057038157efeb3c2ad4702914da/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java#L371-L385]
>  by executing {{mkdir}} and {{rmdir}} periodically.
> If these operations does not return in a moderate timeout, the disk should be 
> marked bad, and thus {{nodeInfo.nodeHealthy}} should flip to {{false}}.
> I confirmed that current YARN does not have an implicit timeout (on JDK7, 
> Linux 4.2, ext4) using [Earthquake|https://github.com/osrg/earthquake], our 
> fault injector for distributed systems.
> (I'll introduce the reproduction script in a while)
> I consider we can fix this issue by making 
> [{{NodeHealthCheckerServer.isHealthy()}}|https://github.com/apache/hadoop/blob/96677bef00b03057038157efeb3c2ad4702914da/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthCheckerService.java#L69-L73]
>  return {{false}} if the value of {{this.getLastHealthReportTime()}} is too 
> old.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4301) NM disk health checker should have a timeout

2015-12-08 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048221#comment-15048221
 ] 

Tsuyoshi Ozawa commented on YARN-4301:
--

[~suda] thank you for the point. I have some comments about v2 patch - could 
you update them?

1. About the synchronization of DirectoryCollection,  I got the point you 
mentioned. The change, however, causes race condition between states in the 
class(localDirs, fullDirs, errorDirs, and numFailures) - e.g. 
{{DirectoryCollection.concat(errorDirs, fullDirs))}}, {{createNonExistentDirs}} 
and other functions cannot work well without synchronization. 

I think the root cause of the problem is to calling {{DC.testDirs}} with 
synchronization in {{DC.checkDirs}}. How about releasing lock before calling 
{{testDirs}} and acquiring lock after calling {{testDirs}}?

{quote}
synchronized DC.getFailedDirs() can be blocked by synchronized DC.checkDirs(), 
when File.mkdir() (called from DC.checkDirs(), via DC.testDirs()) does not 
return in a moderate timeout.
Hence NodeHealthCheckerServer.isHealthy() gets also blocked.
So I would like to make DC.getXXXs unsynchronized.
{quote}

2. If the thread is preempted by OS and moves to another CPU in multicore 
environment, gap can be negative value. Hence I prefer not to abort NodeManager 
here.
{code:title=NodeHealthCheckerService.java}
+long diskCheckTime = dirsHandler.getLastDisksCheckTime();
+long now = System.currentTimeMillis();
+long gap = now - diskCheckTime;
+if (gap < 0) {
+  throw new AssertionError("implementation error - now=" + now
+  + ", diskCheckTime=" + diskCheckTime);
+}
{code}

3. Please move validations of configuration to serviceInit to avoid aborting at 
runtime.
{code:title=NodeHealthCheckerService.java}
+long allowedGap = this.diskHealthCheckInterval + 
this.diskHealthCheckTimeout;
+if (allowedGap <= 0) {
+  throw new AssertionError("implementation error - interval=" + 
this.diskHealthCheckInterval
+  + ", timeout=" + this.diskHealthCheckTimeout);
+}
{code}


> NM disk health checker should have a timeout
> 
>
> Key: YARN-4301
> URL: https://issues.apache.org/jira/browse/YARN-4301
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akihiro Suda
> Attachments: YARN-4301-1.patch, YARN-4301-2.patch, 
> concept-async-diskchecker.txt
>
>
> The disk health checker [verifies a 
> disk|https://github.com/apache/hadoop/blob/96677bef00b03057038157efeb3c2ad4702914da/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java#L371-L385]
>  by executing {{mkdir}} and {{rmdir}} periodically.
> If these operations does not return in a moderate timeout, the disk should be 
> marked bad, and thus {{nodeInfo.nodeHealthy}} should flip to {{false}}.
> I confirmed that current YARN does not have an implicit timeout (on JDK7, 
> Linux 4.2, ext4) using [Earthquake|https://github.com/osrg/earthquake], our 
> fault injector for distributed systems.
> (I'll introduce the reproduction script in a while)
> I consider we can fix this issue by making 
> [{{NodeHealthCheckerServer.isHealthy()}}|https://github.com/apache/hadoop/blob/96677bef00b03057038157efeb3c2ad4702914da/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthCheckerService.java#L69-L73]
>  return {{false}} if the value of {{this.getLastHealthReportTime()}} is too 
> old.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4421) Remove dead code in RmAppImpl.RMAppRecoveredTransition

2015-12-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048215#comment-15048215
 ] 

Hudson commented on YARN-4421:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8946 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8946/])
YARN-4421. Remove dead code in RmAppImpl.RMAppRecoveredTransition. 
(rohithsharmaks: rev a5e2e1ecb06a3942903cb79f61f0f4bb02480f19)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* hadoop-yarn-project/CHANGES.txt


> Remove dead code in RmAppImpl.RMAppRecoveredTransition
> --
>
> Key: YARN-4421
> URL: https://issues.apache.org/jira/browse/YARN-4421
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Trivial
> Fix For: 2.8.0
>
> Attachments: YARN-4421.001.patch
>
>
> The {{transition()}} method contains the following:
> {code}
>   // Last attempt is in final state, return ACCEPTED waiting for last
>   // RMAppAttempt to send finished or failed event back.
>   if (app.currentAttempt != null
>   && (app.currentAttempt.getState() == RMAppAttemptState.KILLED
>   || app.currentAttempt.getState() == RMAppAttemptState.FINISHED
>   || (app.currentAttempt.getState() == RMAppAttemptState.FAILED
>   && app.getNumFailedAppAttempts() == app.maxAppAttempts))) {
> return RMAppState.ACCEPTED;
>   }
>   // YARN-1507 is saving the application state after the application is
>   // accepted. So after YARN-1507, an app is saved meaning it is accepted.
>   // Thus we return ACCECPTED state on recovery.
>   return RMAppState.ACCEPTED;
> {code}
> The {{if}} statement is fully redundant and can be eliminated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4434) NodeManager Disk Checker parameter documentation is not correct

2015-12-08 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-4434:

Attachment: YARN-4434.branch-2.6.patch

I had to rebase the patch for branch-2.6. Attaching the rebased patch.

> NodeManager Disk Checker parameter documentation is not correct
> ---
>
> Key: YARN-4434
> URL: https://issues.apache.org/jira/browse/YARN-4434
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, nodemanager
>Affects Versions: 2.6.0, 2.7.1
>Reporter: Takashi Ohnishi
>Assignee: Weiwei Yang
>Priority: Minor
> Attachments: YARN-4434.001.patch, YARN-4434.branch-2.6.patch
>
>
> In the description of 
> yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage,
>  it says
> {noformat}
> The default value is 100 i.e. the entire disk can be used.
> {noformat}
> But, in yarn-default.xml and source code, the default value is 90.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4434) NodeManager Disk Checker parameter documentation is not correct

2015-12-08 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-4434:

Affects Version/s: 2.6.0
   Labels:   (was: documentation)
 Hadoop Flags: Reviewed
  Component/s: documentation

> NodeManager Disk Checker parameter documentation is not correct
> ---
>
> Key: YARN-4434
> URL: https://issues.apache.org/jira/browse/YARN-4434
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, nodemanager
>Affects Versions: 2.6.0, 2.7.1
>Reporter: Takashi Ohnishi
>Assignee: Weiwei Yang
>Priority: Minor
> Attachments: YARN-4434.001.patch
>
>
> In the description of 
> yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage,
>  it says
> {noformat}
> The default value is 100 i.e. the entire disk can be used.
> {noformat}
> But, in yarn-default.xml and source code, the default value is 90.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4434) NodeManager Disk Checker parameter documentation is not correct

2015-12-08 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048194#comment-15048194
 ] 

Akira AJISAKA commented on YARN-4434:
-

Thanks [~bwtakacy] and [~cheersyang]. I'm +1 for the Weiwei's patch. Committing 
this.

> NodeManager Disk Checker parameter documentation is not correct
> ---
>
> Key: YARN-4434
> URL: https://issues.apache.org/jira/browse/YARN-4434
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Takashi Ohnishi
>Assignee: Weiwei Yang
>Priority: Minor
>  Labels: documentation
> Attachments: YARN-4434.001.patch
>
>
> In the description of 
> yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage,
>  it says
> {noformat}
> The default value is 100 i.e. the entire disk can be used.
> {noformat}
> But, in yarn-default.xml and source code, the default value is 90.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4421) Remove dead code in RmAppImpl.RMAppRecoveredTransition

2015-12-08 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4421:

  Priority: Trivial  (was: Minor)
Issue Type: Bug  (was: Improvement)

> Remove dead code in RmAppImpl.RMAppRecoveredTransition
> --
>
> Key: YARN-4421
> URL: https://issues.apache.org/jira/browse/YARN-4421
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Trivial
> Fix For: 2.8.0
>
> Attachments: YARN-4421.001.patch
>
>
> The {{transition()}} method contains the following:
> {code}
>   // Last attempt is in final state, return ACCEPTED waiting for last
>   // RMAppAttempt to send finished or failed event back.
>   if (app.currentAttempt != null
>   && (app.currentAttempt.getState() == RMAppAttemptState.KILLED
>   || app.currentAttempt.getState() == RMAppAttemptState.FINISHED
>   || (app.currentAttempt.getState() == RMAppAttemptState.FAILED
>   && app.getNumFailedAppAttempts() == app.maxAppAttempts))) {
> return RMAppState.ACCEPTED;
>   }
>   // YARN-1507 is saving the application state after the application is
>   // accepted. So after YARN-1507, an app is saved meaning it is accepted.
>   // Thus we return ACCECPTED state on recovery.
>   return RMAppState.ACCEPTED;
> {code}
> The {{if}} statement is fully redundant and can be eliminated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off

2015-12-08 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048106#comment-15048106
 ] 

Li Lu commented on YARN-4356:
-

Latest patch LGTM. +1 pending Jenkins. I'll wait for one more day and if 
there's no objection I'll commit it. 

> ensure the timeline service v.2 is disabled cleanly and has no impact when 
> it's turned off
> --
>
> Key: YARN-4356
> URL: https://issues.apache.org/jira/browse/YARN-4356
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4356-feature-YARN-2928.002.patch, 
> YARN-4356-feature-YARN-2928.003.patch, YARN-4356-feature-YARN-2928.004.patch, 
> YARN-4356-feature-YARN-2928.poc.001.patch
>
>
> For us to be able to merge the first milestone drop to trunk, we want to 
> ensure that once disabled the timeline service v.2 has no impact from the 
> server side to the client side. If the timeline service is not enabled, no 
> action should be done. If v.1 is enabled but not v.2, v.1 should behave the 
> same as it does before the merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4421) Remove dead code in RmAppImpl.RMAppRecoveredTransition

2015-12-08 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048093#comment-15048093
 ] 

Rohith Sharma K S commented on YARN-4421:
-

Initially in RM restart feature, there was some code that doing functionality 
handling  between those 2 lines. Later on because of improvements/bug, it has 
been removed which looking now as dead code. It can be removed now.

> Remove dead code in RmAppImpl.RMAppRecoveredTransition
> --
>
> Key: YARN-4421
> URL: https://issues.apache.org/jira/browse/YARN-4421
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Minor
> Attachments: YARN-4421.001.patch
>
>
> The {{transition()}} method contains the following:
> {code}
>   // Last attempt is in final state, return ACCEPTED waiting for last
>   // RMAppAttempt to send finished or failed event back.
>   if (app.currentAttempt != null
>   && (app.currentAttempt.getState() == RMAppAttemptState.KILLED
>   || app.currentAttempt.getState() == RMAppAttemptState.FINISHED
>   || (app.currentAttempt.getState() == RMAppAttemptState.FAILED
>   && app.getNumFailedAppAttempts() == app.maxAppAttempts))) {
> return RMAppState.ACCEPTED;
>   }
>   // YARN-1507 is saving the application state after the application is
>   // accepted. So after YARN-1507, an app is saved meaning it is accepted.
>   // Thus we return ACCECPTED state on recovery.
>   return RMAppState.ACCEPTED;
> {code}
> The {{if}} statement is fully redundant and can be eliminated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4431) Not necessary to do unRegisterNM() if NM get stop due to failed to connect to RM

2015-12-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048088#comment-15048088
 ] 

Hudson commented on YARN-4431:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8945 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8945/])
YARN-4431. Not necessary to do unRegisterNM() if NM get stop due to 
(rohithsharmaks: rev 15c3e7ffe3d1c57ad36afd993f09fc47889c93bd)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java


> Not necessary to do unRegisterNM() if NM get stop due to failed to connect to 
> RM
> 
>
> Key: YARN-4431
> URL: https://issues.apache.org/jira/browse/YARN-4431
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: 2.8.0
>
> Attachments: YARN-4431.patch
>
>
> {noformat}
> 2015-12-07 12:16:57,873 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: 0.0.0.0/0.0.0.0:8031. Already tried 8 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2015-12-07 12:16:58,874 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: 0.0.0.0/0.0.0.0:8031. Already tried 9 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2015-12-07 12:16:58,876 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: 
> Unregistration of the Node 10.200.10.53:25454 failed.
> java.net.ConnectException: Call From jduMBP.local/10.200.10.53 to 
> 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: 
> Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> at sun.reflect.GeneratedConstructorAccessor30.newInstance(Unknown 
> Source)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
> at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
> at org.apache.hadoop.ipc.Client.call(Client.java:1452)
> at org.apache.hadoop.ipc.Client.call(Client.java:1385)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> at com.sun.proxy.$Proxy74.unRegisterNodeManager(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.unRegisterNodeManager(ResourceTrackerPBClientImpl.java:98)
> at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:255)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
> at com.sun.proxy.$Proxy75.unRegisterNodeManager(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.unRegisterNM(NodeStatusUpdaterImpl.java:267)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStop(NodeStatusUpdaterImpl.java:245)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
> at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
> at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:377)
> {noformat}
> If RM down for some reason, NM's NodeStatusUpdaterImpl will retry the 
> connection with proper retry policy. After retry the maximum times (15 
> minutes by default), it will send NodeManagerEventType.SHUTDOWN to shutdown 
> NM. But NM shutdown will call NodeStatusUpdaterImpl.serviceStop() which will 
> call unRegisterNM() to unregister NM from RM and get retry again (another 15 
> minutes). This is completely unnecessary and we should skip unRegisterNM when 
> NM get shutdown because of connection issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4225) Add preemption status to yarn queue -status for capacity scheduler

2015-12-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048082#comment-15048082
 ] 

Hadoop QA commented on YARN-4225:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
40s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 52s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 6s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
26s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 0s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
52s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
26s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 25s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in trunk failed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 0s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
51s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 53s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 53s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 53s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 10s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 10s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 10s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 28s 
{color} | {color:red} Patch generated 1 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 50, now 50). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 59s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
52s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 40s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
introduced 1 new FindBugs issues. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 24s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 2s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 3s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 3s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 22s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s 
{color} | {color:green} hadoop-yarn-api in the patch passed with

[jira] [Updated] (YARN-4431) Not necessary to do unRegisterNM() if NM get stop due to failed to connect to RM

2015-12-08 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4431:

Component/s: nodemanager

> Not necessary to do unRegisterNM() if NM get stop due to failed to connect to 
> RM
> 
>
> Key: YARN-4431
> URL: https://issues.apache.org/jira/browse/YARN-4431
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: 2.8.0
>
> Attachments: YARN-4431.patch
>
>
> {noformat}
> 2015-12-07 12:16:57,873 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: 0.0.0.0/0.0.0.0:8031. Already tried 8 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2015-12-07 12:16:58,874 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: 0.0.0.0/0.0.0.0:8031. Already tried 9 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2015-12-07 12:16:58,876 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: 
> Unregistration of the Node 10.200.10.53:25454 failed.
> java.net.ConnectException: Call From jduMBP.local/10.200.10.53 to 
> 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: 
> Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> at sun.reflect.GeneratedConstructorAccessor30.newInstance(Unknown 
> Source)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
> at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
> at org.apache.hadoop.ipc.Client.call(Client.java:1452)
> at org.apache.hadoop.ipc.Client.call(Client.java:1385)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> at com.sun.proxy.$Proxy74.unRegisterNodeManager(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.unRegisterNodeManager(ResourceTrackerPBClientImpl.java:98)
> at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:255)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
> at com.sun.proxy.$Proxy75.unRegisterNodeManager(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.unRegisterNM(NodeStatusUpdaterImpl.java:267)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStop(NodeStatusUpdaterImpl.java:245)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
> at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
> at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:377)
> {noformat}
> If RM down for some reason, NM's NodeStatusUpdaterImpl will retry the 
> connection with proper retry policy. After retry the maximum times (15 
> minutes by default), it will send NodeManagerEventType.SHUTDOWN to shutdown 
> NM. But NM shutdown will call NodeStatusUpdaterImpl.serviceStop() which will 
> call unRegisterNM() to unregister NM from RM and get retry again (another 15 
> minutes). This is completely unnecessary and we should skip unRegisterNM when 
> NM get shutdown because of connection issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off

2015-12-08 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-4356:
--
Attachment: YARN-4356-feature-YARN-2928.004.patch

Posted patch v.4 that addresses Li's comments.

> ensure the timeline service v.2 is disabled cleanly and has no impact when 
> it's turned off
> --
>
> Key: YARN-4356
> URL: https://issues.apache.org/jira/browse/YARN-4356
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4356-feature-YARN-2928.002.patch, 
> YARN-4356-feature-YARN-2928.003.patch, YARN-4356-feature-YARN-2928.004.patch, 
> YARN-4356-feature-YARN-2928.poc.001.patch
>
>
> For us to be able to merge the first milestone drop to trunk, we want to 
> ensure that once disabled the timeline service v.2 has no impact from the 
> server side to the client side. If the timeline service is not enabled, no 
> action should be done. If v.1 is enabled but not v.2, v.1 should behave the 
> same as it does before the merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4194) Extend Reservation Definition Langauge (RDL) extensions to support node labels

2015-12-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048069#comment-15048069
 ] 

Hadoop QA commented on YARN-4194:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
50s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 37s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 2s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
40s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 30s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
33s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
49s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 57s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 18s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 39s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 3m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 3m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 3s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 3m 3s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 3m 3s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 36s 
{color} | {color:red} Patch generated 2 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 17, now 19). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 50s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 43s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 6s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 36s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 51s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 25s 
{color} | {color:red} Patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 60m 49s {color} 
| {color:black} {color} |
\\
\\
|| Subsys

[jira] [Commented] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off

2015-12-08 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048067#comment-15048067
 ] 

Sangjin Lee commented on YARN-4356:
---

Oh I see. Yes, there is no NM metrics publisher in ATS v.1.x, so it should be 
fine. Thanks for that.

> ensure the timeline service v.2 is disabled cleanly and has no impact when 
> it's turned off
> --
>
> Key: YARN-4356
> URL: https://issues.apache.org/jira/browse/YARN-4356
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4356-feature-YARN-2928.002.patch, 
> YARN-4356-feature-YARN-2928.003.patch, 
> YARN-4356-feature-YARN-2928.poc.001.patch
>
>
> For us to be able to merge the first milestone drop to trunk, we want to 
> ensure that once disabled the timeline service v.2 has no impact from the 
> server side to the client side. If the timeline service is not enabled, no 
> action should be done. If v.1 is enabled but not v.2, v.1 should behave the 
> same as it does before the merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off

2015-12-08 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048057#comment-15048057
 ] 

Li Lu commented on YARN-4356:
-

Thanks [~sjlee0]!
bq. The v.1 behavior should be essentially the same as today. The existing v.1 
behavior is to check TIMELINE_SERVICE_ENABLED and then 
RM_SYSTEM_METRICS_PUBLISHER_ENABLED. You'll see that the current patch checks 
TIMELINE_SERVICE_ENABLED, TIMELINE_SERVICE_VERSION == 1, and 
RM_SYSTEM_METRICS_PUBLISHER_ENABLED. I do see that it's checking strictly for 
version = 1. I'll change it to check for version < 2 so it can match 1.5 as 
well.

Sorry about the confusion here but I was talking about this part of the code:
{code}
223 // initialize the metrics publisher if the timeline service v.2 is 
enabled
224 // and the system publisher is enabled
225 Configuration conf = context.getConf();
226 if (YarnConfiguration.timelineServiceV2Enabled(conf) &&
227 YarnConfiguration.systemMetricsPublisherEnabled(conf)) {
228   LOG.info("YARN system metrics publishing service is enabled");
229   nmMetricsPublisher = createNMTimelinePublisher(context);
230   context.setNMTimelinePublisher(nmMetricsPublisher);
231 }
{code}

Looks like in ATS v1.x branch we don't have the nmMetricsPublisher so it's 
fine? Just want to double check this part. 

> ensure the timeline service v.2 is disabled cleanly and has no impact when 
> it's turned off
> --
>
> Key: YARN-4356
> URL: https://issues.apache.org/jira/browse/YARN-4356
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4356-feature-YARN-2928.002.patch, 
> YARN-4356-feature-YARN-2928.003.patch, 
> YARN-4356-feature-YARN-2928.poc.001.patch
>
>
> For us to be able to merge the first milestone drop to trunk, we want to 
> ensure that once disabled the timeline service v.2 has no impact from the 
> server side to the client side. If the timeline service is not enabled, no 
> action should be done. If v.1 is enabled but not v.2, v.1 should behave the 
> same as it does before the merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4431) Not necessary to do unRegisterNM() if NM get stop due to failed to connect to RM

2015-12-08 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048052#comment-15048052
 ] 

Rohith Sharma K S commented on YARN-4431:
-

committing shortly

> Not necessary to do unRegisterNM() if NM get stop due to failed to connect to 
> RM
> 
>
> Key: YARN-4431
> URL: https://issues.apache.org/jira/browse/YARN-4431
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-4431.patch
>
>
> {noformat}
> 2015-12-07 12:16:57,873 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: 0.0.0.0/0.0.0.0:8031. Already tried 8 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2015-12-07 12:16:58,874 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: 0.0.0.0/0.0.0.0:8031. Already tried 9 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2015-12-07 12:16:58,876 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: 
> Unregistration of the Node 10.200.10.53:25454 failed.
> java.net.ConnectException: Call From jduMBP.local/10.200.10.53 to 
> 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: 
> Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> at sun.reflect.GeneratedConstructorAccessor30.newInstance(Unknown 
> Source)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
> at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
> at org.apache.hadoop.ipc.Client.call(Client.java:1452)
> at org.apache.hadoop.ipc.Client.call(Client.java:1385)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> at com.sun.proxy.$Proxy74.unRegisterNodeManager(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.unRegisterNodeManager(ResourceTrackerPBClientImpl.java:98)
> at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:255)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
> at com.sun.proxy.$Proxy75.unRegisterNodeManager(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.unRegisterNM(NodeStatusUpdaterImpl.java:267)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStop(NodeStatusUpdaterImpl.java:245)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
> at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
> at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:377)
> {noformat}
> If RM down for some reason, NM's NodeStatusUpdaterImpl will retry the 
> connection with proper retry policy. After retry the maximum times (15 
> minutes by default), it will send NodeManagerEventType.SHUTDOWN to shutdown 
> NM. But NM shutdown will call NodeStatusUpdaterImpl.serviceStop() which will 
> call unRegisterNM() to unregister NM from RM and get retry again (another 15 
> minutes). This is completely unnecessary and we should skip unRegisterNM when 
> NM get shutdown because of connection issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off

2015-12-08 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048049#comment-15048049
 ] 

Sangjin Lee commented on YARN-4356:
---

Thanks for your review [~gtCarrera9].

bq. I noticed in some files we're verifying v2 in a hard-coded fashion (version 
== 2). Why do we still need this especially when we have 
timelineServiceV2Enabled()?

The only reason for them is because timelineServiceV2Enabled() is 
timelineServiceEnabled() + (timelineServiceVersion == 2). In those cases, 
timelineServiceEnabled() was already checked. Thus, as a (small) optimization I 
just checked the version directly. Having said that, I'm comfortable with 
changing them to call timelineServiceV2Enabled() even though it may check 
timelineServiceEnabled() one extra time. I'll make those changes.

bq. That is, if the timeline-service.version is set to 2.x in future, are the 
applications allowed to use other versions of ATS?

It should be possible in principle with the assumption that some compatibility 
mechanism is in place so an old API invocation can succeed. The config is there 
to discover what's running on the cluster. If there is a compatibility 
mechanism, applications may invoke a different API (it's entirely up to them at 
that point).

bq. ApplicationMaster, function names "...OnNewTimelineService" can be more 
specific like "...V2"?

Sounds good. I didn't rename methods as part of this work, but let me see if I 
can rename them to use "v2".

bq. ContainerManagerImpl, I just want to double check one behavior: the SMP is 
enabled for the NM only when timeline version is v2 and SMP is enabled in the 
config? What about v1.x versions? If this is a v2 only feature, shall we 
clarify that in the log message?

The v.1 behavior should be essentially the same as today. The existing v.1 
behavior is to check TIMELINE_SERVICE_ENABLED and then 
RM_SYSTEM_METRICS_PUBLISHER_ENABLED. You'll see that the current patch checks 
TIMELINE_SERVICE_ENABLED, TIMELINE_SERVICE_VERSION == 1, and 
RM_SYSTEM_METRICS_PUBLISHER_ENABLED. I do see that it's checking strictly for 
version = 1. I'll change it to check for version < 2 so it can match 1.5 as 
well.

> ensure the timeline service v.2 is disabled cleanly and has no impact when 
> it's turned off
> --
>
> Key: YARN-4356
> URL: https://issues.apache.org/jira/browse/YARN-4356
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4356-feature-YARN-2928.002.patch, 
> YARN-4356-feature-YARN-2928.003.patch, 
> YARN-4356-feature-YARN-2928.poc.001.patch
>
>
> For us to be able to merge the first milestone drop to trunk, we want to 
> ensure that once disabled the timeline service v.2 has no impact from the 
> server side to the client side. If the timeline service is not enabled, no 
> action should be done. If v.1 is enabled but not v.2, v.1 should behave the 
> same as it does before the merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4431) Not necessary to do unRegisterNM() if NM get stop due to failed to connect to RM

2015-12-08 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048043#comment-15048043
 ] 

Rohith Sharma K S commented on YARN-4431:
-

+1 lgtm

> Not necessary to do unRegisterNM() if NM get stop due to failed to connect to 
> RM
> 
>
> Key: YARN-4431
> URL: https://issues.apache.org/jira/browse/YARN-4431
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-4431.patch
>
>
> {noformat}
> 2015-12-07 12:16:57,873 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: 0.0.0.0/0.0.0.0:8031. Already tried 8 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2015-12-07 12:16:58,874 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: 0.0.0.0/0.0.0.0:8031. Already tried 9 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2015-12-07 12:16:58,876 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: 
> Unregistration of the Node 10.200.10.53:25454 failed.
> java.net.ConnectException: Call From jduMBP.local/10.200.10.53 to 
> 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: 
> Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> at sun.reflect.GeneratedConstructorAccessor30.newInstance(Unknown 
> Source)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
> at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
> at org.apache.hadoop.ipc.Client.call(Client.java:1452)
> at org.apache.hadoop.ipc.Client.call(Client.java:1385)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> at com.sun.proxy.$Proxy74.unRegisterNodeManager(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.unRegisterNodeManager(ResourceTrackerPBClientImpl.java:98)
> at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:255)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
> at com.sun.proxy.$Proxy75.unRegisterNodeManager(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.unRegisterNM(NodeStatusUpdaterImpl.java:267)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStop(NodeStatusUpdaterImpl.java:245)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
> at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
> at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:377)
> {noformat}
> If RM down for some reason, NM's NodeStatusUpdaterImpl will retry the 
> connection with proper retry policy. After retry the maximum times (15 
> minutes by default), it will send NodeManagerEventType.SHUTDOWN to shutdown 
> NM. But NM shutdown will call NodeStatusUpdaterImpl.serviceStop() which will 
> call unRegisterNM() to unregister NM from RM and get retry again (another 15 
> minutes). This is completely unnecessary and we should skip unRegisterNM when 
> NM get shutdown because of connection issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4424) Fix deadlock in RMAppImpl

2015-12-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048029#comment-15048029
 ] 

Hudson commented on YARN-4424:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #677 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/677/])
YARN-4424. Fix deadlock in RMAppImpl. (Jian he via wangda) (wangda: rev 
7e4715186d31ac889fba26d453feedcebb11fc70)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* hadoop-yarn-project/CHANGES.txt


> Fix deadlock in RMAppImpl
> -
>
> Key: YARN-4424
> URL: https://issues.apache.org/jira/browse/YARN-4424
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Jian He
>Priority: Blocker
> Fix For: 2.7.2, 2.6.3
>
> Attachments: YARN-4424.1.patch
>
>
> {code}
> yarn@XXX:/mnt/hadoopqe$ /usr/hdp/current/hadoop-yarn-client/bin/yarn 
> application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING
> 15/12/04 21:59:54 INFO impl.TimelineClientImpl: Timeline service address: 
> http://XXX:8188/ws/v1/timeline/
> 15/12/04 21:59:54 INFO client.RMProxy: Connecting to ResourceManager at 
> XXX/0.0.0.0:8050
> 15/12/04 21:59:55 INFO client.AHSProxy: Connecting to Application History 
> server at XXX/0.0.0.0:10200
> {code}
> {code:title=RM log}
> 2015-12-04 21:59:19,744 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 237000
> 2015-12-04 22:00:50,945 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 238000
> 2015-12-04 22:02:22,416 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 239000
> 2015-12-04 22:03:53,593 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 24
> 2015-12-04 22:05:24,856 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 241000
> 2015-12-04 22:06:56,235 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 242000
> 2015-12-04 22:08:27,510 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 243000
> 2015-12-04 22:09:58,786 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 244000
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4248) REST API for submit/update/delete Reservations

2015-12-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048030#comment-15048030
 ] 

Hudson commented on YARN-4248:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #677 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/677/])
YARN-4248. Followup patch adding asf-licence exclusions for json test 
(cdouglas: rev 9f50e13d5dc329c3a6df7f9bcaf2f29b35adc52b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml


> REST API for submit/update/delete Reservations
> --
>
> Key: YARN-4248
> URL: https://issues.apache.org/jira/browse/YARN-4248
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.8.0
>
> Attachments: YARN-4248-asflicense.patch, YARN-4248.2.patch, 
> YARN-4248.3.patch, YARN-4248.5.patch, YARN-4248.6.patch, YARN-4248.patch
>
>
> This JIRA tracks work to extend the RMWebService to support REST APIs to 
> submit/update/delete reservations. This will ease integration with external 
> tools that are not java-based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4417) Make RM and Timeline-server REST APIs more consistent

2015-12-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047995#comment-15047995
 ] 

Hadoop QA commented on YARN-4417:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
56s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
16s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 24s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in trunk failed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
37s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s 
{color} | {color:red} Patch generated 5 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 49, now 52). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
25s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 24s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 22s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 8s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
24s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 159m 29s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Not

[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-12-08 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047987#comment-15047987
 ] 

Li Lu commented on YARN-4234:
-

Patch generally LGTM. The only issue is that if NameNode is unavailable and 
retry is not set, the timeline client will quickly retry and then fail. This 
will cause either application attempts to fail, or the RM to fail to start. 
Maybe we can try some mechanisms like in FileSystemRMStateStore#startInternal, 
where we explicitly change related retry policy config? 

Other than this corner case issue I'm fine with this patch. Right now people 
are reaching agreements on YARN-3623, so probably YARN-3623 can go in very 
soon. This said, could some committers please review the current patch? Thanks! 

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4234-2015-11-13.1.patch, 
> YARN-4234-2015-11-16.1.patch, YARN-4234-2015-11-16.2.patch, 
> YARN-4234-2015.2.patch, YARN-4234.1.patch, YARN-4234.2.patch, 
> YARN-4234.2015-11-12.1.patch, YARN-4234.2015-11-12.1.patch, 
> YARN-4234.2015-11-18.1.patch, YARN-4234.2015-11-18.2.patch, 
> YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, 
> YARN-4234.2015.1.patch, YARN-4234.3.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4301) NM disk health checker should have a timeout

2015-12-08 Thread Akihiro Suda (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047931#comment-15047931
 ] 

Akihiro Suda commented on YARN-4301:


The warning is for {{concept-async-diskchecker.txt}}, which is just a concept 
document, not a patch.

I didn't know that Yetus recognizes {{*.txt}} file as a patch.



> NM disk health checker should have a timeout
> 
>
> Key: YARN-4301
> URL: https://issues.apache.org/jira/browse/YARN-4301
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akihiro Suda
> Attachments: YARN-4301-1.patch, YARN-4301-2.patch, 
> concept-async-diskchecker.txt
>
>
> The disk health checker [verifies a 
> disk|https://github.com/apache/hadoop/blob/96677bef00b03057038157efeb3c2ad4702914da/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java#L371-L385]
>  by executing {{mkdir}} and {{rmdir}} periodically.
> If these operations does not return in a moderate timeout, the disk should be 
> marked bad, and thus {{nodeInfo.nodeHealthy}} should flip to {{false}}.
> I confirmed that current YARN does not have an implicit timeout (on JDK7, 
> Linux 4.2, ext4) using [Earthquake|https://github.com/osrg/earthquake], our 
> fault injector for distributed systems.
> (I'll introduce the reproduction script in a while)
> I consider we can fix this issue by making 
> [{{NodeHealthCheckerServer.isHealthy()}}|https://github.com/apache/hadoop/blob/96677bef00b03057038157efeb3c2ad4702914da/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthCheckerService.java#L69-L73]
>  return {{false}} if the value of {{this.getLastHealthReportTime()}} is too 
> old.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off

2015-12-08 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047912#comment-15047912
 ] 

Li Lu commented on YARN-4356:
-

Hi [~sjlee0], thanks for the work! Mostly LGTM, just a few thing to check:
1. I noticed in some files we're verifying v2 in a hard-coded fashion (version 
== 2). Why do we still need this especially when we have 
timelineServiceV2Enabled()? 
2. MapRed will use the timeline.version config as the current active API 
version. I'm fine with this design. One thing to check: do we allow other 
applications to customize the active API version for themselves? That is, if 
the timeline-service.version is set to 2.x in future, are the applications 
allowed to use other versions of ATS? (I think in this case the compatibility 
story should be made by the application itself? )
3. ApplicationMaster, function names "...OnNewTimelineService" can be more 
specific like "...V2"?
4. ContainerManagerImpl, I just want to double check one behavior: the SMP is 
enabled for the NM only when timeline version is v2 and SMP is enabled in the 
config? What about v1.x versions? If this is a v2 only feature, shall we 
clarify that in the log message? 

Thanks! 

> ensure the timeline service v.2 is disabled cleanly and has no impact when 
> it's turned off
> --
>
> Key: YARN-4356
> URL: https://issues.apache.org/jira/browse/YARN-4356
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4356-feature-YARN-2928.002.patch, 
> YARN-4356-feature-YARN-2928.003.patch, 
> YARN-4356-feature-YARN-2928.poc.001.patch
>
>
> For us to be able to merge the first milestone drop to trunk, we want to 
> ensure that once disabled the timeline service v.2 has no impact from the 
> server side to the client side. If the timeline service is not enabled, no 
> action should be done. If v.1 is enabled but not v.2, v.1 should behave the 
> same as it does before the merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4415) Scheduler Web Ui shows max capacity for the queue is 100% but when we submit application doesnt get assigned

2015-12-08 Thread Xianyin Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047909#comment-15047909
 ] 

Xianyin Xin commented on YARN-4415:
---

+1 for the general idea from [~Naganarasimha]. I think there exists many 
improper in the code, especially when dealing with label "*", like 
{{setupQueueConfigs()}} in {{AbstractCSQueue}} and, the 
{{PartitionQueueCapacitiesInfo}} and {{QueueCapacitiesInfo}} when return the 
actually capacities. I suggest you just upload a preview patch so that the 
problem can exposed in another way, sounds feasible?

> Scheduler Web Ui shows max capacity for the queue is 100% but when we submit 
> application doesnt get assigned
> 
>
> Key: YARN-4415
> URL: https://issues.apache.org/jira/browse/YARN-4415
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: App info with diagnostics info.png, 
> capacity-scheduler.xml, screenshot-1.png
>
>
> Steps to reproduce the issue :
> Scenario 1:
> # Configure a queue(default) with accessible node labels as *
> # create a exclusive partition *xxx* and map a NM to it
> # ensure no capacities are configured for default for label xxx
> # start an RM app with queue as default and label as xxx
> # application is stuck but scheduler ui shows 100% as max capacity for that 
> queue
> Scenario 2:
> # create a nonexclusive partition *sharedPartition* and map a NM to it
> # ensure no capacities are configured for default queue
> # start an RM app with queue as *default* and label as *sharedPartition*
> # application is stuck but scheduler ui shows 100% as max capacity for that 
> queue for *sharedPartition*
> For both issues cause is the same default max capacity and abs max capacity 
> is set to Zero %



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4301) NM disk health checker should have a timeout

2015-12-08 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047902#comment-15047902
 ] 

Tsuyoshi Ozawa commented on YARN-4301:
--

[~suda] thank you for updating. The warning by findbugs looks related to the 
change. Could you fix it?

> NM disk health checker should have a timeout
> 
>
> Key: YARN-4301
> URL: https://issues.apache.org/jira/browse/YARN-4301
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akihiro Suda
> Attachments: YARN-4301-1.patch, YARN-4301-2.patch, 
> concept-async-diskchecker.txt
>
>
> The disk health checker [verifies a 
> disk|https://github.com/apache/hadoop/blob/96677bef00b03057038157efeb3c2ad4702914da/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java#L371-L385]
>  by executing {{mkdir}} and {{rmdir}} periodically.
> If these operations does not return in a moderate timeout, the disk should be 
> marked bad, and thus {{nodeInfo.nodeHealthy}} should flip to {{false}}.
> I confirmed that current YARN does not have an implicit timeout (on JDK7, 
> Linux 4.2, ext4) using [Earthquake|https://github.com/osrg/earthquake], our 
> fault injector for distributed systems.
> (I'll introduce the reproduction script in a while)
> I consider we can fix this issue by making 
> [{{NodeHealthCheckerServer.isHealthy()}}|https://github.com/apache/hadoop/blob/96677bef00b03057038157efeb3c2ad4702914da/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthCheckerService.java#L69-L73]
>  return {{false}} if the value of {{this.getLastHealthReportTime()}} is too 
> old.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4340) Add "list" API to reservation system

2015-12-08 Thread Sean Po (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Po updated YARN-4340:
--
Attachment: YARN-4340.v7.patch

This patch addresses the remaining findbugs, checkstyle errors.

The following unit tests are failing and have associated Jira tickets:
hadoop.yarn.server.resourcemanager.TestClientRMTokens 
-- YARN-4306
hadoop.yarn.server.resourcemanager.TestAMAuthorization 
-- YARN-4318
hadoop.yarn.client.TestGetGroups 
-- YARN-4351

The following unit tests are passing locally and flakiness may be related to 
YARN-4352:
org.apache.hadoop.yarn.client.api.impl.TestYarnClient
-- testShouldNotRetryForeverForNonNetworkExceptions also fails locally on trunk
-- testAMMRToken passes locally on trunk
org.apache.hadoop.yarn.client.api.impl.TestAMRMClient
org.apache.hadoop.yarn.client.api.impl.TestNMClient

The following tests are passing locally:
hadoop.mapreduce.v2.TestMRJobsWithProfiler

> Add "list" API to reservation system
> 
>
> Key: YARN-4340
> URL: https://issues.apache.org/jira/browse/YARN-4340
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Sean Po
> Attachments: YARN-4340.v1.patch, YARN-4340.v2.patch, 
> YARN-4340.v3.patch, YARN-4340.v4.patch, YARN-4340.v5.patch, 
> YARN-4340.v6.patch, YARN-4340.v7.patch
>
>
> This JIRA tracks changes to the APIs of the reservation system, and enables 
> querying the reservation system on which reservation exists by "time-range, 
> reservation-id".
> YARN-4420 has a dependency on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4341) add doc about timeline performance tool usage

2015-12-08 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047879#comment-15047879
 ] 

Sangjin Lee commented on YARN-4341:
---

Sorry [~lichangleo] it took me a while to get to this. Some corrections and 
suggestions below.

(Highlights)
- "Timeline..." -> "The timeline..."
- "help measure..." -> "helps measure..."
- "Test will launch..." -> "The test launches..."
- "JobHistoryFileReplay mapper" -> "JobHistoryFileReplay mappers"
- ".. to timeline server." -> "... to the timeline server."
- "In the end," -> "At the end,"
- "transaction rate(ops/s)" -> "the transaction rate (ops/s)"
- "and transaction rate in total" -> "and the total transaction rate"
- "print out" -> "printed out"
- "To run the test..." -> "Running the test..."
- "IO rate(KB/s)" -> "the I/O rate (KB/s)"
- "IO rate total." -> "the total I/O rate."

(Usages)
- "Usages" -> "Usage"
- "Each mapper write user specified number of timeline entities to 
timelineserver and each timeline entity is created with user specified size." 
-> "Each mapper writes a user-specified number of timeline entities with a 
user-specified size to the timeline server."
- "Each mappe replay..." -> "Each mapper replays..."
- "... to be replayed. suggest to launch mappers no more than..." -> "...to be 
replayed; the number of mappers should be no more than..."

> add doc about timeline performance tool usage
> -
>
> Key: YARN-4341
> URL: https://issues.apache.org/jira/browse/YARN-4341
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4341.2.patch, YARN-4341.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4194) Extend Reservation Definition Langauge (RDL) extensions to support node labels

2015-12-08 Thread Alexey Tumanov (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Tumanov updated YARN-4194:
-
Attachment: YARN-4194-v1.patch

patch now attached.

> Extend Reservation Definition Langauge (RDL) extensions to support node labels
> --
>
> Key: YARN-4194
> URL: https://issues.apache.org/jira/browse/YARN-4194
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Alexey Tumanov
> Attachments: YARN-4194-v1.patch
>
>
> This JIRA tracks changes to the APIs to the reservation system to support
> the expressivity of node-labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3368) [Umbrella] Improve YARN web UI

2015-12-08 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047794#comment-15047794
 ] 

Wangda Tan commented on YARN-3368:
--

Also I've renamed this JIRA to an umbrella JIRA for efforts of YARN web UI 
improvements. Please feel free to file tickets for bugs/features.

> [Umbrella] Improve YARN web UI
> --
>
> Key: YARN-3368
> URL: https://issues.apache.org/jira/browse/YARN-3368
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
> Attachments: (Dec 3 2015) yarn-ui-screenshots.zip, (POC, Aug-2015)) 
> yarn-ui-screenshots.zip
>
>
> The goal is to improve YARN UI for better usability.
> We may take advantage of some existing front-end frameworks to build a 
> fancier, easier-to-use UI. 
> The old UI continue to exist until  we feel it's ready to flip to the new UI.
> This serves as an umbrella jira to track the tasks. we can do this in a 
> branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3368) [Umbrella] Improve YARN web UI

2015-12-08 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047793#comment-15047793
 ] 

Wangda Tan commented on YARN-3368:
--

Thanks,

I've created YARN-3368 branch and committed patches to it. You can follow steps 
in {{hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/README.md}} to try this 
patch.

> [Umbrella] Improve YARN web UI
> --
>
> Key: YARN-3368
> URL: https://issues.apache.org/jira/browse/YARN-3368
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
> Attachments: (Dec 3 2015) yarn-ui-screenshots.zip, (POC, Aug-2015)) 
> yarn-ui-screenshots.zip
>
>
> The goal is to improve YARN UI for better usability.
> We may take advantage of some existing front-end frameworks to build a 
> fancier, easier-to-use UI. 
> The old UI continue to exist until  we feel it's ready to flip to the new UI.
> This serves as an umbrella jira to track the tasks. we can do this in a 
> branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3368) [Umbrella] Improve YARN web UI

2015-12-08 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3368:
-
Summary: [Umbrella] Improve YARN web UI  (was: Improve YARN web UI)

> [Umbrella] Improve YARN web UI
> --
>
> Key: YARN-3368
> URL: https://issues.apache.org/jira/browse/YARN-3368
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
> Attachments: (Dec 3 2015) yarn-ui-screenshots.zip, (POC, Aug-2015)) 
> yarn-ui-screenshots.zip
>
>
> The goal is to improve YARN UI for better usability.
> We may take advantage of some existing front-end frameworks to build a 
> fancier, easier-to-use UI. 
> The old UI continue to exist until  we feel it's ready to flip to the new UI.
> This serves as an umbrella jira to track the tasks. we can do this in a 
> branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4309) Add debug information to application logs when a container fails

2015-12-08 Thread Ivan Mitic (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047788#comment-15047788
 ] 

Ivan Mitic commented on YARN-4309:
--

Thanks [~vvasudev]. Latest patch looks good, I am +1 on the Windows side 
changes. Please also have someone actively working on Yarn to +1 on the overall 
approach and Linux side. 

> Add debug information to application logs when a container fails
> 
>
> Key: YARN-4309
> URL: https://issues.apache.org/jira/browse/YARN-4309
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4309.001.patch, YARN-4309.002.patch, 
> YARN-4309.003.patch, YARN-4309.004.patch, YARN-4309.005.patch, 
> YARN-4309.006.patch, YARN-4309.007.patch, YARN-4309.008.patch, 
> YARN-4309.009.patch
>
>
> Sometimes when a container fails, it can be pretty hard to figure out why it 
> failed.
> My proposal is that if a container fails, we collect information about the 
> container local dir and dump it into the container log dir. Ideally, I'd like 
> to tar up the directory entirely, but I'm not sure of the security and space 
> implications of such a approach. At the very least, we can list all the files 
> in the container local dir, and dump the contents of launch_container.sh(into 
> the container log dir).
> When log aggregation occurs, all this information will automatically get 
> collected and make debugging such failures much easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4417) Make RM and Timeline-server REST APIs more consistent

2015-12-08 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4417:
-
Attachment: YARN-4417.2.patch

Attached ver.2 patch fixed test failures.

> Make RM and Timeline-server REST APIs more consistent
> -
>
> Key: YARN-4417
> URL: https://issues.apache.org/jira/browse/YARN-4417
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4417.1.patch, YARN-4417.2.patch
>
>
> There're some differences between RM and timeline-server's REST APIs, for 
> example, RM REST API doesn't support get application attempt info by app-id 
> and attempt-id but timeline server supports. We could make them more 
> consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version

2015-12-08 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047758#comment-15047758
 ] 

Li Lu commented on YARN-3623:
-

bq. it may be sufficient to note it here and carry on that discussion on a v.2 
subtask. Sound good?
Agree. We can further investigate this issue in the v2 branch. I'm also fine 
with the current config name in [~xgong]'s patch. 

> We should have a config to indicate the Timeline Service version
> 
>
> Key: YARN-3623
> URL: https://issues.apache.org/jira/browse/YARN-3623
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Attachments: YARN-3623-2015-11-19.1.patch
>
>
> So far RM, MR AM, DA AM added/changed new config to enable the feature to 
> write the timeline data to v2 server. It's good to have a YARN 
> timeline-service.version config like timeline-service.enable to indicate the 
> version of the running timeline service with the given YARN cluster. It's 
> beneficial for users to more smoothly move from v1 to v2, as they don't need 
> to change the existing config, but switch this config from v1 to v2. And each 
> framework doesn't need to have their own v1/v2 config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4381) Add container launchEvent and container localizeFailed metrics in container

2015-12-08 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047742#comment-15047742
 ] 

Lin Yiqun commented on YARN-4381:
-

[~djp], the jenkin report shows that checkstyle warnings is not need to modify 
and license warnings likes not related. Could you review my patch again?

> Add container launchEvent and container localizeFailed metrics in container
> ---
>
> Key: YARN-4381
> URL: https://issues.apache.org/jira/browse/YARN-4381
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4381.001.patch, YARN-4381.002.patch
>
>
> Recently, I found a issue on nodemanager metrics.That's 
> {{NodeManagerMetrics#containersLaunched}} is not actually means the container 
> succeed launched times.Because in some time, it will be failed when receiving 
> the killing command or happening container-localizationFailed.This will lead 
> to a failed container.But now,this counter value will be increased in these 
> code whenever the container is started successfully or failed.
> {code}
> Credentials credentials = parseCredentials(launchContext);
> Container container =
> new ContainerImpl(getConfig(), this.dispatcher,
> context.getNMStateStore(), launchContext,
>   credentials, metrics, containerTokenIdentifier);
> ApplicationId applicationID =
> containerId.getApplicationAttemptId().getApplicationId();
> if (context.getContainers().putIfAbsent(containerId, container) != null) {
>   NMAuditLogger.logFailure(user, AuditConstants.START_CONTAINER,
> "ContainerManagerImpl", "Container already running on this node!",
> applicationID, containerId);
>   throw RPCUtil.getRemoteException("Container " + containerIdStr
>   + " already is running on this node!!");
> }
> this.readLock.lock();
> try {
>   if (!serviceStopped) {
> // Create the application
> Application application =
> new ApplicationImpl(dispatcher, user, applicationID, credentials, 
> context);
> if (null == context.getApplications().putIfAbsent(applicationID,
>   application)) {
>   LOG.info("Creating a new application reference for app " + 
> applicationID);
>   LogAggregationContext logAggregationContext =
>   containerTokenIdentifier.getLogAggregationContext();
>   Map appAcls =
>   container.getLaunchContext().getApplicationACLs();
>   context.getNMStateStore().storeApplication(applicationID,
>   buildAppProto(applicationID, user, credentials, appAcls,
> logAggregationContext));
>   dispatcher.getEventHandler().handle(
> new ApplicationInitEvent(applicationID, appAcls,
>   logAggregationContext));
> }
> this.context.getNMStateStore().storeContainer(containerId, request);
> dispatcher.getEventHandler().handle(
>   new ApplicationContainerInitEvent(container));
> 
> this.context.getContainerTokenSecretManager().startContainerSuccessful(
>   containerTokenIdentifier);
> NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER,
>   "ContainerManageImpl", applicationID, containerId);
> // TODO launchedContainer misplaced -> doesn't necessarily mean a 
> container
> // launch. A finished Application will not launch containers.
> metrics.launchedContainer();
> metrics.allocateContainer(containerTokenIdentifier.getResource());
>   } else {
> throw new YarnException(
> "Container start failed as the NodeManager is " +
> "in the process of shutting down");
>   }
> {code}
> In addition, we are lack of localzationFailed metric in container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off

2015-12-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047738#comment-15047738
 ] 

Hadoop QA commented on YARN-4356:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 12 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
45s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
13s {color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} 
|
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 40s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
9s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 5m 17s 
{color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 2m 
33s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 21s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in 
feature-YARN-2928 has 3 extant Findbugs warnings. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 32s 
{color} | {color:red} hadoop-yarn-common in feature-YARN-2928 failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 25s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in feature-YARN-2928 
failed with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 6m 39s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 
39s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 51s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 24m 41s 
{color} | {color:red} root-jdk1.8.0_66 with JDK v1.8.0_66 generated 5 new 
issues (was 779, now 779). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 51s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 30s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 34m 11s 
{color} | {color:red} root-jdk1.7.0_85 with JDK v1.7.0_85 generated 5 new 
issues (was 772, now 772). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 30s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 3s 
{color} | {color:red} Patch generated 26 new checkstyle issues in root (total 
was 1937, now 1931). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 5m 9s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 2m 
28s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 10m 
43s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 35s 
{color} | {color:red} hadoop-yarn-common in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 27s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 6m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s 
{color} | {color:green} hadoop-yarn-ap

[jira] [Commented] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state in CS

2015-12-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047702#comment-15047702
 ] 

Hadoop QA commented on YARN-3946:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
4s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
15s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 25s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in trunk failed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
36s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s 
{color} | {color:red} Patch generated 28 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 654, now 678). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
28s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 24s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 55s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 28s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
24s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 155m 36s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | ha

[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version

2015-12-08 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047695#comment-15047695
 ] 

Sangjin Lee commented on YARN-3623:
---

I agree the rolling upgrade use case from v.1 to v.2 should be addressed. We 
had some offline discussion on this too. Since it is a pretty major item in and 
of itself and somewhat separate (being v.2-specific) from this specific JIRA, 
it may be sufficient to note it here and carry on that discussion on a v.2 
subtask. Sound good?

I'm fine with the current name "yarn.timeline-service.version". I just want to 
clarify the interpretation of this config on the cluster side and on the client 
side.

On the cluster side, it *should* always be interpreted as precisely which 
version of the timeline service should be up. If 
"yarn.timeline-service.version" is 1.5, and "yarn.timeline-service.enabled" is 
true, it should be understood as the cluster should bring up the timeline 
service v.1.5 (and nothing else), and the client can expect that to be the case.

On the client side, clearly a client that uses the same version should expect 
to succeed. If a client chooses to use a smaller version in spite of this, then 
depending on how robust the compatibility story is between versions, the 
results may vary (part of the rolling upgrade discussion included).

> We should have a config to indicate the Timeline Service version
> 
>
> Key: YARN-3623
> URL: https://issues.apache.org/jira/browse/YARN-3623
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Attachments: YARN-3623-2015-11-19.1.patch
>
>
> So far RM, MR AM, DA AM added/changed new config to enable the feature to 
> write the timeline data to v2 server. It's good to have a YARN 
> timeline-service.version config like timeline-service.enable to indicate the 
> version of the running timeline service with the given YARN cluster. It's 
> beneficial for users to more smoothly move from v1 to v2, as they don't need 
> to change the existing config, but switch this config from v1 to v2. And each 
> framework doesn't need to have their own v1/v2 config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off

2015-12-08 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-4356:
--
Attachment: YARN-4356-feature-YARN-2928.003.patch

Posted patch v.3.

Addressed the javadoc, findbugs, and checkstyle errors.

The unit tests are tests that are known to fail in the trunk or on our branch 
(e.g. YARN-4350, MAPREDUCE-6533, MAPREDUCE-6540, etc.).

> ensure the timeline service v.2 is disabled cleanly and has no impact when 
> it's turned off
> --
>
> Key: YARN-4356
> URL: https://issues.apache.org/jira/browse/YARN-4356
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4356-feature-YARN-2928.002.patch, 
> YARN-4356-feature-YARN-2928.003.patch, 
> YARN-4356-feature-YARN-2928.poc.001.patch
>
>
> For us to be able to merge the first milestone drop to trunk, we want to 
> ensure that once disabled the timeline service v.2 has no impact from the 
> server side to the client side. If the timeline service is not enabled, no 
> action should be done. If v.1 is enabled but not v.2, v.1 should behave the 
> same as it does before the merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4424) Fix deadlock in RMAppImpl

2015-12-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047679#comment-15047679
 ] 

Hudson commented on YARN-4424:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8943 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8943/])
YARN-4424. Fix deadlock in RMAppImpl. (Jian he via wangda) (wangda: rev 
7e4715186d31ac889fba26d453feedcebb11fc70)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java


> Fix deadlock in RMAppImpl
> -
>
> Key: YARN-4424
> URL: https://issues.apache.org/jira/browse/YARN-4424
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Jian He
>Priority: Blocker
> Fix For: 2.7.2, 2.6.3
>
> Attachments: YARN-4424.1.patch
>
>
> {code}
> yarn@XXX:/mnt/hadoopqe$ /usr/hdp/current/hadoop-yarn-client/bin/yarn 
> application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING
> 15/12/04 21:59:54 INFO impl.TimelineClientImpl: Timeline service address: 
> http://XXX:8188/ws/v1/timeline/
> 15/12/04 21:59:54 INFO client.RMProxy: Connecting to ResourceManager at 
> XXX/0.0.0.0:8050
> 15/12/04 21:59:55 INFO client.AHSProxy: Connecting to Application History 
> server at XXX/0.0.0.0:10200
> {code}
> {code:title=RM log}
> 2015-12-04 21:59:19,744 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 237000
> 2015-12-04 22:00:50,945 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 238000
> 2015-12-04 22:02:22,416 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 239000
> 2015-12-04 22:03:53,593 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 24
> 2015-12-04 22:05:24,856 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 241000
> 2015-12-04 22:06:56,235 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 242000
> 2015-12-04 22:08:27,510 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 243000
> 2015-12-04 22:09:58,786 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 244000
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version

2015-12-08 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047675#comment-15047675
 ] 

Li Lu commented on YARN-3623:
-

Thanks for the review [~djp]! The ATS v1.5 introduces some new API on top of 
ATS v1 APIs. However, ATS v2 is not compatible with either versions. I agree 
that a config would suffice to specify the "active" ATS version or the version 
of the writer API a client should use. Right now I think a config with name 
"yarn.timeline-service.version" is fine because this leaves flexibility to 
allow a set of active ATS writer API versions in the system. Marking a latest 
version may not be quite useful since ATS 1.x is not API-compatible with ATS 
v2.x. 

On the other hand, I totally agree there should be a comprehensive story for 
ATS rolling upgrade. IIUC, ATS v1 can be upgraded in a rolling fashion to v1.5. 
Meanwhile, if the ATS v1/1.5 server is available in the system, v1.x server 
should be able to work with v2.x clients (since the v1 server won't be touched 
by ATS v2 client). Therefore, I think the rolling upgrade story, from ATS v1.x 
to ATS v2, can be reduced to the ability for ATS v1 servers and ATS v2 can 
co-exist in the cluster? We can certainly have more discussion on the rolling 
upgrade in ATS v2 JIRAs. 

> We should have a config to indicate the Timeline Service version
> 
>
> Key: YARN-3623
> URL: https://issues.apache.org/jira/browse/YARN-3623
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Attachments: YARN-3623-2015-11-19.1.patch
>
>
> So far RM, MR AM, DA AM added/changed new config to enable the feature to 
> write the timeline data to v2 server. It's good to have a YARN 
> timeline-service.version config like timeline-service.enable to indicate the 
> version of the running timeline service with the given YARN cluster. It's 
> beneficial for users to more smoothly move from v1 to v2, as they don't need 
> to change the existing config, but switch this config from v1 to v2. And each 
> framework doesn't need to have their own v1/v2 config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4225) Add preemption status to yarn queue -status for capacity scheduler

2015-12-08 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-4225:
-
Attachment: YARN-4225.004.patch

Thanks very much [~leftnoteasy], for your review and helpful comments.

{quote}
I'm OK with both approach - existing one in latest patch or simply return false 
if there's no such field in proto.
{quote}
So, if I understand correctly, you are okay with 
{{QueueInfo#getPreemptionDisabled}} returning {{Boolean}} with the possibility 
of returning {{null}} if the field doesn't exist. With that understanding, I'm 
leaving that in the latest patch.
{quote}
2) For QueueCLI, is it better to print "preemption is disabled/enabled" instead 
of "preemption status: disabled/enabled"?
{quote}
Actually, I think that leaving it as "Preemption : disabled/enabled" is more 
consistent with the way the other properties are displayed. What do you think?
{quote}
3) Is it possible to add a simple test to verify end-to-end behavior?
{quote}
I added a couple of tests to {{TestYarnCLI}}. Good suggestion.

> Add preemption status to yarn queue -status for capacity scheduler
> --
>
> Key: YARN-4225
> URL: https://issues.apache.org/jira/browse/YARN-4225
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, yarn
>Affects Versions: 2.7.1
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
> Attachments: YARN-4225.001.patch, YARN-4225.002.patch, 
> YARN-4225.003.patch, YARN-4225.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4436) DistShell ApplicationMaster.ExecBatScripStringtPath is misspelled

2015-12-08 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-4436:
--

 Summary: DistShell ApplicationMaster.ExecBatScripStringtPath is 
misspelled
 Key: YARN-4436
 URL: https://issues.apache.org/jira/browse/YARN-4436
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Affects Versions: 2.7.1
Reporter: Daniel Templeton
Assignee: Devon Michaels
Priority: Trivial


It should be ExecBatScriptStringPath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4424) Fix deadlock in RMAppImpl

2015-12-08 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4424:
-
Summary: Fix deadlock in RMAppImpl  (was: YARN CLI command hangs)

> Fix deadlock in RMAppImpl
> -
>
> Key: YARN-4424
> URL: https://issues.apache.org/jira/browse/YARN-4424
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Jian He
>Priority: Blocker
> Attachments: YARN-4424.1.patch
>
>
> {code}
> yarn@XXX:/mnt/hadoopqe$ /usr/hdp/current/hadoop-yarn-client/bin/yarn 
> application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING
> 15/12/04 21:59:54 INFO impl.TimelineClientImpl: Timeline service address: 
> http://XXX:8188/ws/v1/timeline/
> 15/12/04 21:59:54 INFO client.RMProxy: Connecting to ResourceManager at 
> XXX/0.0.0.0:8050
> 15/12/04 21:59:55 INFO client.AHSProxy: Connecting to Application History 
> server at XXX/0.0.0.0:10200
> {code}
> {code:title=RM log}
> 2015-12-04 21:59:19,744 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 237000
> 2015-12-04 22:00:50,945 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 238000
> 2015-12-04 22:02:22,416 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 239000
> 2015-12-04 22:03:53,593 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 24
> 2015-12-04 22:05:24,856 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 241000
> 2015-12-04 22:06:56,235 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 242000
> 2015-12-04 22:08:27,510 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 243000
> 2015-12-04 22:09:58,786 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 244000
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4435) Add RM Delegation Token DtFetcher Implementation for DtUtil

2015-12-08 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-4435:
---
Assignee: Matthew Paduano

> Add RM Delegation Token DtFetcher Implementation for DtUtil
> ---
>
> Key: YARN-4435
> URL: https://issues.apache.org/jira/browse/YARN-4435
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Matthew Paduano
>Assignee: Matthew Paduano
> Attachments: proposed_solution
>
>
> Add a class to yarn project that implements the DtFetcher interface to return 
> a RM delegation token object.  
> I attached a proposed class implementation that does this, but it cannot be 
> added as a patch until the interface is merged in HADOOP-12563



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4248) REST API for submit/update/delete Reservations

2015-12-08 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047576#comment-15047576
 ] 

Chris Douglas commented on YARN-4248:
-

Thanks, Chris.

> REST API for submit/update/delete Reservations
> --
>
> Key: YARN-4248
> URL: https://issues.apache.org/jira/browse/YARN-4248
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.8.0
>
> Attachments: YARN-4248-asflicense.patch, YARN-4248.2.patch, 
> YARN-4248.3.patch, YARN-4248.5.patch, YARN-4248.6.patch, YARN-4248.patch
>
>
> This JIRA tracks work to extend the RMWebService to support REST APIs to 
> submit/update/delete reservations. This will ease integration with external 
> tools that are not java-based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Moved] (YARN-4435) Add RM Delegation Token DtFetcher Implementation for DtUtil

2015-12-08 Thread Matthew Paduano (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthew Paduano moved HADOOP-12599 to YARN-4435:


Assignee: (was: Matthew Paduano)
 Key: YARN-4435  (was: HADOOP-12599)
 Project: Hadoop YARN  (was: Hadoop Common)

> Add RM Delegation Token DtFetcher Implementation for DtUtil
> ---
>
> Key: YARN-4435
> URL: https://issues.apache.org/jira/browse/YARN-4435
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Matthew Paduano
> Attachments: proposed_solution
>
>
> Add a class to yarn project that implements the DtFetcher interface to return 
> a RM delegation token object.  
> I attached a proposed class implementation that does this, but it cannot be 
> added as a patch until the interface is merged in HADOOP-12563



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4248) REST API for submit/update/delete Reservations

2015-12-08 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047553#comment-15047553
 ] 

Chris Nauroth commented on YARN-4248:
-

bq. Not sure why it wasn't flagged by test-patch.

I decided to dig into this.  At the time that pre-commit ran for YARN-4248, 
there was an unrelated license warning present in HDFS, introduced by HDFS-9414.

https://builds.apache.org/job/PreCommit-YARN-Build/9872/artifact/patchprocess/patch-asflicense-problems.txt

Unfortunately, if there is a pre-existing license warning, then the {{mvn 
apache-rat:check}} build halts at that first failing module.  Since 
hadoop-hdfs-client builds before hadoop-yarn-server-resourcemanager, it masked 
the new license warnings introduced by this patch.  This is visible here if you 
scroll to the bottom and notice module Apache Hadoop HDFS Client failed, 
followed by skipping all subsequent modules.

https://builds.apache.org/job/PreCommit-YARN-Build/9872/artifact/patchprocess/patch-asflicense-root.txt

Maybe we can do better when there are pre-existing license warnings, perhaps by 
using the {{--fail-at-end}} option to make sure we check all modules.  I filed 
YETUS-221.

> REST API for submit/update/delete Reservations
> --
>
> Key: YARN-4248
> URL: https://issues.apache.org/jira/browse/YARN-4248
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.8.0
>
> Attachments: YARN-4248-asflicense.patch, YARN-4248.2.patch, 
> YARN-4248.3.patch, YARN-4248.5.patch, YARN-4248.6.patch, YARN-4248.patch
>
>
> This JIRA tracks work to extend the RMWebService to support REST APIs to 
> submit/update/delete reservations. This will ease integration with external 
> tools that are not java-based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4248) REST API for submit/update/delete Reservations

2015-12-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047418#comment-15047418
 ] 

Hudson commented on YARN-4248:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8941 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8941/])
YARN-4248. Followup patch adding asf-licence exclusions for json test 
(cdouglas: rev 9f50e13d5dc329c3a6df7f9bcaf2f29b35adc52b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml


> REST API for submit/update/delete Reservations
> --
>
> Key: YARN-4248
> URL: https://issues.apache.org/jira/browse/YARN-4248
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.8.0
>
> Attachments: YARN-4248-asflicense.patch, YARN-4248.2.patch, 
> YARN-4248.3.patch, YARN-4248.5.patch, YARN-4248.6.patch, YARN-4248.patch
>
>
> This JIRA tracks work to extend the RMWebService to support REST APIs to 
> submit/update/delete reservations. This will ease integration with external 
> tools that are not java-based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1856) cgroups based memory monitoring for containers

2015-12-08 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047400#comment-15047400
 ] 

Vinod Kumar Vavilapalli commented on YARN-1856:
---

Quick comments on the patch:

General
 - Should add all the configs to yarn-default.xml, saying they are still early 
configs?
 - Should update the documentation of pmem-check-enabled, vmem-check-enabled 
configs in code and yarn-default.xml to denote their relation to 
resource.memory.enabled.
 - Actually, given existing memory monitoring mechanism, 
NM_MEMORY_RESOURCE_ENABLED is in reality is already true when pmem/vmem checks 
are enabled. We need to reconcile the old and new configs some how. May be 
memory is always enabled, but if vmem/pmem configs are enabled, use old 
handler, otherwise use the new one? Thinking out aloud.
 - Does the soft and hard limits also some-how logically relate to 
pmem-vmem-ratio? If so, we should hint at that in the documentation.
 - Swappiness seems like a cluster configuration defaulting to zero. So far, 
this has been an implicit contract with our users, good to document this also 
in yarn-default.xml

Code comments
 - ResourceHandlerModule
-- Formatting of new code is a little off: the declaration of 
{{getCgroupsMemoryResourceHandler()}}. There are other occurrences like this in 
that class before in this patch, you may want to fix those.
-- BUG! getCgroupsMemoryResourceHandler() incorrectly locks 
DiskResourceHandler instead of MemoryResourceHandler.
 - CGroupsMemoryResourceHandlerImpl
-- What is this doing? {{  CGroupsHandler.CGroupController MEMORY =
CGroupsHandler.CGroupController.MEMORY; }} Is it forcing a class-load or 
something? Not sure if this is needed. If this is needed, you may want to add a 
comment here.
 - NM_MEMORY_RESOURCE_CGROUPS_SOFT_LIMIT_PERC -> 
NM_MEMORY_RESOURCE_CGROUPS_SOFT_LIMIT_PERCENTAGE. Similarly the default 
constant.
 - CGROUP_PARAM_MEMORY_HARD_LIMIT_BYTES / CGROUP_PARAM_MEMORY_SOFT_LIMIT_BYTES 
/ CGROUP_PARAM_MEMORY_SWAPPINESS can all be static and final.

> cgroups based memory monitoring for containers
> --
>
> Key: YARN-1856
> URL: https://issues.apache.org/jira/browse/YARN-1856
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.3.0
>Reporter: Karthik Kambatla
>Assignee: Varun Vasudev
> Attachments: YARN-1856.001.patch, YARN-1856.002.patch, 
> YARN-1856.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4248) REST API for submit/update/delete Reservations

2015-12-08 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047395#comment-15047395
 ] 

Chris Douglas commented on YARN-4248:
-

Pushed to trunk, branch-2, branch-2.8. Sorry to have missed these in review. 
Not sure why it wasn't flagged by test-patch.

> REST API for submit/update/delete Reservations
> --
>
> Key: YARN-4248
> URL: https://issues.apache.org/jira/browse/YARN-4248
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.8.0
>
> Attachments: YARN-4248-asflicense.patch, YARN-4248.2.patch, 
> YARN-4248.3.patch, YARN-4248.5.patch, YARN-4248.6.patch, YARN-4248.patch
>
>
> This JIRA tracks work to extend the RMWebService to support REST APIs to 
> submit/update/delete reservations. This will ease integration with external 
> tools that are not java-based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state in CS

2015-12-08 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3946:

Attachment: YARN-3946.v1.007.patch

Hi [~wangda],
I have incorporated the changes suggested by you. Please take a look 

> Allow fetching exact reason as to why a submitted app is in ACCEPTED state in 
> CS
> 
>
> Key: YARN-3946
> URL: https://issues.apache.org/jira/browse/YARN-3946
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Sumit Nigam
>Assignee: Naganarasimha G R
> Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, 
> YARN-3946.v1.002.patch, YARN-3946.v1.003.Images.zip, YARN-3946.v1.003.patch, 
> YARN-3946.v1.004.patch, YARN-3946.v1.005.patch, YARN-3946.v1.006.patch, 
> YARN-3946.v1.007.patch
>
>
> Currently there is no direct way to get the exact reason as to why a 
> submitted app is still in ACCEPTED state. It should be possible to know 
> through RM REST API as to what aspect is not being met - say, queue limits 
> being reached, or core/ memory requirement not being met, or AM limit being 
> reached, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4248) REST API for submit/update/delete Reservations

2015-12-08 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047396#comment-15047396
 ] 

Carlo Curino commented on YARN-4248:


Thanks for spotting this and to [~chris.douglas] for the zero-latency fix. I 
spoke with him and he will commit it soon (as I am travelling at the moment).

> REST API for submit/update/delete Reservations
> --
>
> Key: YARN-4248
> URL: https://issues.apache.org/jira/browse/YARN-4248
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.8.0
>
> Attachments: YARN-4248-asflicense.patch, YARN-4248.2.patch, 
> YARN-4248.3.patch, YARN-4248.5.patch, YARN-4248.6.patch, YARN-4248.patch
>
>
> This JIRA tracks work to extend the RMWebService to support REST APIs to 
> submit/update/delete reservations. This will ease integration with external 
> tools that are not java-based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4248) REST API for submit/update/delete Reservations

2015-12-08 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047389#comment-15047389
 ] 

Chris Nauroth commented on YARN-4248:
-

Hi [~curino].  It looks like [~chris.douglas] just uploaded a patch to set up 
an exclusion of the json files from the license check.  +1 for this.  Thanks, 
Chris.

> REST API for submit/update/delete Reservations
> --
>
> Key: YARN-4248
> URL: https://issues.apache.org/jira/browse/YARN-4248
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.8.0
>
> Attachments: YARN-4248-asflicense.patch, YARN-4248.2.patch, 
> YARN-4248.3.patch, YARN-4248.5.patch, YARN-4248.6.patch, YARN-4248.patch
>
>
> This JIRA tracks work to extend the RMWebService to support REST APIs to 
> submit/update/delete reservations. This will ease integration with external 
> tools that are not java-based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4100) Add Documentation for Distributed Node Labels feature

2015-12-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047386#comment-15047386
 ] 

Hadoop QA commented on YARN-4100:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 13s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 23s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
25s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 10s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 10s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 21s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 3s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 9s 
{color} | {color:green} hadoop-yarn-site in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 19s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 9s 
{color} | {color:green} hadoop-yarn-site in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 20s 
{color} | {color:red} Patch generated 3 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 46m 16s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12776358/YARN-4100.v1.001.patch
 |
| JIRA Issue | YARN-4100 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  xml  |
| uname | Linux f60b5fcdd61e 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 9676774 |
| JDK v1.7.0_9

[jira] [Updated] (YARN-4248) REST API for submit/update/delete Reservations

2015-12-08 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-4248:

Attachment: YARN-4248-asflicense.patch

> REST API for submit/update/delete Reservations
> --
>
> Key: YARN-4248
> URL: https://issues.apache.org/jira/browse/YARN-4248
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.8.0
>
> Attachments: YARN-4248-asflicense.patch, YARN-4248.2.patch, 
> YARN-4248.3.patch, YARN-4248.5.patch, YARN-4248.6.patch, YARN-4248.patch
>
>
> This JIRA tracks work to extend the RMWebService to support REST APIs to 
> submit/update/delete reservations. This will ease integration with external 
> tools that are not java-based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4248) REST API for submit/update/delete Reservations

2015-12-08 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047378#comment-15047378
 ] 

Carlo Curino commented on YARN-4248:


Chris, I am happy to fix it, but if I am not mistaken json doesn't allow 
comments... Any advise?


> REST API for submit/update/delete Reservations
> --
>
> Key: YARN-4248
> URL: https://issues.apache.org/jira/browse/YARN-4248
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.8.0
>
> Attachments: YARN-4248.2.patch, YARN-4248.3.patch, YARN-4248.5.patch, 
> YARN-4248.6.patch, YARN-4248.patch
>
>
> This JIRA tracks work to extend the RMWebService to support REST APIs to 
> submit/update/delete reservations. This will ease integration with external 
> tools that are not java-based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4350) TestDistributedShell fails

2015-12-08 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047361#comment-15047361
 ] 

Sangjin Lee commented on YARN-4350:
---

I think either way is fine, although all things being equal I would slightly 
prefer the dynamic port. It's your call. :)

> TestDistributedShell fails
> --
>
> Key: YARN-4350
> URL: https://issues.apache.org/jira/browse/YARN-4350
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-4350-feature-YARN-2928.001.patch
>
>
> Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. 
> There seem to be 2 distinct issues.
> (1) testDSShellWithoutDomainV2* tests fail sporadically
> These test fail more often than not if tested by themselves:
> {noformat}
> testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 30.998 sec  <<< FAILURE!
> java.lang.AssertionError: Application created event should be published 
> atleast once expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207)
> {noformat}
> They start happening after YARN-4129. I suspect this might have to do with 
> some timing issue.
> (2) the whole test times out
> If you run the whole TestDistributedShell test, it times out without fail. 
> This may or may not have to do with the port change introduced by YARN-2859 
> (just a hunch).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4350) TestDistributedShell fails

2015-12-08 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047285#comment-15047285
 ] 

Naganarasimha G R commented on YARN-4350:
-

Thanks [~sjlee0] & [~vrushalic],
bq. Can we go back to before YARN-2859 and restore this unit test for the time 
being
Would it be better to totally revert or apply the change which i was mentioning 
{{ServerSocketUtil.getPort}} so that we avoid the fixed ports (which YARN-2859 
was trying to solve) and also get the current test case to be solved? i can 
mark a comment in YARN-4372 to take care of the fix done here temporarily !

> TestDistributedShell fails
> --
>
> Key: YARN-4350
> URL: https://issues.apache.org/jira/browse/YARN-4350
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-4350-feature-YARN-2928.001.patch
>
>
> Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. 
> There seem to be 2 distinct issues.
> (1) testDSShellWithoutDomainV2* tests fail sporadically
> These test fail more often than not if tested by themselves:
> {noformat}
> testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 30.998 sec  <<< FAILURE!
> java.lang.AssertionError: Application created event should be published 
> atleast once expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207)
> {noformat}
> They start happening after YARN-4129. I suspect this might have to do with 
> some timing issue.
> (2) the whole test times out
> If you run the whole TestDistributedShell test, it times out without fail. 
> This may or may not have to do with the port change introduced by YARN-2859 
> (just a hunch).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4350) TestDistributedShell fails

2015-12-08 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047254#comment-15047254
 ] 

Vrushali C commented on YARN-4350:
--


I see, thanks [~Naganarasimha] for the clarification.

+1 on going back to before YARN-2859 and restoring this unit test for the time 
being.

> TestDistributedShell fails
> --
>
> Key: YARN-4350
> URL: https://issues.apache.org/jira/browse/YARN-4350
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-4350-feature-YARN-2928.001.patch
>
>
> Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. 
> There seem to be 2 distinct issues.
> (1) testDSShellWithoutDomainV2* tests fail sporadically
> These test fail more often than not if tested by themselves:
> {noformat}
> testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 30.998 sec  <<< FAILURE!
> java.lang.AssertionError: Application created event should be published 
> atleast once expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207)
> {noformat}
> They start happening after YARN-4129. I suspect this might have to do with 
> some timing issue.
> (2) the whole test times out
> If you run the whole TestDistributedShell test, it times out without fail. 
> This may or may not have to do with the port change introduced by YARN-2859 
> (just a hunch).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4415) Scheduler Web Ui shows max capacity for the queue is 100% but when we submit application doesnt get assigned

2015-12-08 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4415:

Attachment: capacity-scheduler.xml

Hi [~wangda],
bq. I think QueueCapacitiesInfo should not assume maxCapacity will be > eps. We 
have normalizations while setting values to QueueCapacities, so we should copy 
exactly same value from QueueCapacities to QueueCapacitiesInfo (cap it between 
0 and 1 is fine).
Point i am trying to make here is that none of the capacities are configured 
for a given queue and partition. and hence Queue Capacities will not be having 
configured capacities for the given label and when QueueCapacitiesInfo is 
queried for the non existent label then it returns the default capacities as 0 
and max as 100 (though this can be corrected to be 1)

bq. It's a valid use case that a queue has max capacity = 0, for example, 
reservation system (YARN-1051) could dynamically adjust queue capacities.
I am not against to the concept of configuring the max capacity to zero but the 
default should not be zero, if not we will not be able to make benifit of 
accessible node labels as {{*}}

bq. I may not fully understand why we need to fetch parent queue's capacities 
while setting QueueCapacitiesInfo. As I mentioned above, QueueCapacities should 
have everything considered and calculated at QueueCapacities (including parent 
queue's capacities), correct
In the example scenarios which i have mentioned, queue can access a particular 
particular partition, but the capacities for it is not configured. So in that 
case QueueCapacities will not be have the label. Also when accessible nodelabel 
is configured as {{*}} then any new label can be added to the cluster and NM 
can be mapped to it, but as the capacities are not configured for the queue, 
allocations can not happen

Hope i am clear if not, i  have uploaded my capacity scheduler xml . just 
create a new partition label xxx and try to submit a job for it in default 
Queue (default queue is configured with accessible nodelabels as {{*}} ). Job 
will not be able to proceed.


> Scheduler Web Ui shows max capacity for the queue is 100% but when we submit 
> application doesnt get assigned
> 
>
> Key: YARN-4415
> URL: https://issues.apache.org/jira/browse/YARN-4415
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: App info with diagnostics info.png, 
> capacity-scheduler.xml, screenshot-1.png
>
>
> Steps to reproduce the issue :
> Scenario 1:
> # Configure a queue(default) with accessible node labels as *
> # create a exclusive partition *xxx* and map a NM to it
> # ensure no capacities are configured for default for label xxx
> # start an RM app with queue as default and label as xxx
> # application is stuck but scheduler ui shows 100% as max capacity for that 
> queue
> Scenario 2:
> # create a nonexclusive partition *sharedPartition* and map a NM to it
> # ensure no capacities are configured for default queue
> # start an RM app with queue as *default* and label as *sharedPartition*
> # application is stuck but scheduler ui shows 100% as max capacity for that 
> queue for *sharedPartition*
> For both issues cause is the same default max capacity and abs max capacity 
> is set to Zero %



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off

2015-12-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047232#comment-15047232
 ] 

Hadoop QA commented on YARN-4356:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 12 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
5s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 15s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 0s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
6s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 5m 0s 
{color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 2m 
27s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 17s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in 
feature-YARN-2928 has 3 extant Findbugs warnings. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 28s 
{color} | {color:red} hadoop-yarn-common in feature-YARN-2928 failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 21s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in feature-YARN-2928 
failed with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 58s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 50s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 22m 2s {color} 
| {color:red} root-jdk1.8.0_66 with JDK v1.8.0_66 generated 5 new issues (was 
780, now 780). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 46s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 30m 49s 
{color} | {color:red} root-jdk1.7.0_91 with JDK v1.7.0_91 generated 5 new 
issues (was 772, now 772). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 46s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 1s 
{color} | {color:red} Patch generated 26 new checkstyle issues in root (total 
was 1938, now 1932). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 55s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 2m 
28s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 10m 
6s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 4m 11s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-jdk1.8.0_66 with JDK v1.8.0_66 
generated 6 new issues (was 100, now 100). {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 27s 
{color} | {color:red} hadoop-yarn-common in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 21s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} j

[jira] [Updated] (YARN-4368) Support Multiple versions of the timeline service at the same time

2015-12-08 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4368:
-
Labels: yarn-2928-1st-milestone  (was: )

> Support Multiple versions of the timeline service at the same time
> --
>
> Key: YARN-4368
> URL: https://issues.apache.org/jira/browse/YARN-4368
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
>
> During rolling updgrade it will be helpfull to have the older version of the 
> timeline server to be also running so that the existing apps can submit to 
> the older version of ATS .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-4240) Add documentation for delegated-centralized node labels feature

2015-12-08 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R resolved YARN-4240.
-
Resolution: Duplicate

Will be handled as part of YARN-4100 itself

> Add documentation for delegated-centralized node labels feature
> ---
>
> Key: YARN-4240
> URL: https://issues.apache.org/jira/browse/YARN-4240
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Dian Fu
>Assignee: Dian Fu
>
> As a follow up of YARN-3964, we should add documentation for 
> delegated-centralized node labels feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4100) Add Documentation for Distributed Node Labels feature

2015-12-08 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4100:

Attachment: NodeLabel.html
YARN-4100.v1.001.patch

Hi [~wangda],[~dian.fu], [~devaraj.k] & [~rohithsharma],
Please review the the attached patch for the documentation update for different 
configuration types of the Node Labels. This covers the scope of the YARN-4240

> Add Documentation for Distributed Node Labels feature
> -
>
> Key: YARN-4100
> URL: https://issues.apache.org/jira/browse/YARN-4100
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: NodeLabel.html, YARN-4100.v1.001.patch
>
>
> Add Documentation for Distributed Node Labels feature



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4248) REST API for submit/update/delete Reservations

2015-12-08 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047165#comment-15047165
 ] 

Chris Nauroth commented on YARN-4248:
-

This patch introduced license warnings on the testing json files.  Here is an 
example from the latest pre-commit run on HADOOP-11505.

https://builds.apache.org/job/PreCommit-HADOOP-Build/8202/artifact/patchprocess/patch-asflicense-problems.txt

Would you please either revert or quickly correct the license warning?  Thank 
you.

> REST API for submit/update/delete Reservations
> --
>
> Key: YARN-4248
> URL: https://issues.apache.org/jira/browse/YARN-4248
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.8.0
>
> Attachments: YARN-4248.2.patch, YARN-4248.3.patch, YARN-4248.5.patch, 
> YARN-4248.6.patch, YARN-4248.patch
>
>
> This JIRA tracks work to extend the RMWebService to support REST APIs to 
> submit/update/delete reservations. This will ease integration with external 
> tools that are not java-based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1856) cgroups based memory monitoring for containers

2015-12-08 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047156#comment-15047156
 ] 

Sidharta Seethana commented on YARN-1856:
-

Ugh. IDE Snafu - someone how ended looking at an older version of the patch. 

+1 on the latest version of the patch.

> cgroups based memory monitoring for containers
> --
>
> Key: YARN-1856
> URL: https://issues.apache.org/jira/browse/YARN-1856
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.3.0
>Reporter: Karthik Kambatla
>Assignee: Varun Vasudev
> Attachments: YARN-1856.001.patch, YARN-1856.002.patch, 
> YARN-1856.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4403) (AM/NM/Container)LivelinessMonitor should use monotonic time when calculating period

2015-12-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047143#comment-15047143
 ] 

Hadoop QA commented on YARN-4403:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
4s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 6s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 20s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
36s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 24s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in trunk failed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
7s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 4s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 6m 26s {color} 
| {color:red} hadoop-yarn-project_hadoop-yarn-jdk1.8.0_66 with JDK v1.8.0_66 
generated 1 new issues (was 14, now 14). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 4s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 19s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 19s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
28s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 13s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
28s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 0s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 24s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 0s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 0s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 20s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_85. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 26s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_85. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 23s 
{color} | {color:red} Patch generated 3 ASF License warnings

[jira] [Commented] (YARN-4427) NPE on handleNMContainerStatus when NM is registering to RM

2015-12-08 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047139#comment-15047139
 ] 

Sunil G commented on YARN-4427:
---

Thanks [~brahma] for the details.
During recovery of AppAttempt, {{masterContainer}} can be null only if the 
AttemptState doesn't have and its very unlikely. if ZK was unstable, do you 
mean a partial recovery has happened here for AppAttempt where 
{{masterContainer}} is null?
Could you also pls share the final state of that attempt during recovery.


> NPE on handleNMContainerStatus when NM is registering to RM
> ---
>
> Key: YARN-4427
> URL: https://issues.apache.org/jira/browse/YARN-4427
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Critical
>
>  *Seen the following in one of our environment when AM got allocated 
> container but failed to updated in the ZK Where cluster is having network 
> problem for sometime(up and down).* 
> {noformat}
> 2015-12-07 16:39:38,489 | WARN  | IPC Server handler 49 on 26003 | IPC Server 
> handler 49 on 26003, call 
> org.apache.hadoop.yarn.server.api.ResourceTrackerPB.registerNodeManager from 
> 9.91.8.220:52169 Call#17 Retry#0 | Server.java:2107
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.handleNMContainerStatus(ResourceTrackerService.java:286)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.registerNodeManager(ResourceTrackerService.java:395)
> at 
> org.apache.hadoop.yarn.server.api.impl.pb.service.ResourceTrackerPBServiceImpl.registerNodeManager(ResourceTrackerPBServiceImpl.java:54)
> at 
> org.apache.hadoop.yarn.proto.ResourceTracker$ResourceTrackerService$2.callBlockingMethod(ResourceTracker.java:79)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:973)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2088)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2084)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1673)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2082)
> {noformat}
> Corresponding code, it might not match with {{branch-2.7/Trunk}} since we had 
> modified internally.
> {code}
>  284  RMAppAttempt rmAppAttempt = rmApp.getRMAppAttempt(appAttemptId);
>  285  Container masterContainer = rmAppAttempt.getMasterContainer();
>  286  if (masterContainer.getId().equals(containerStatus.getContainerId())
>  287   && containerStatus.getContainerState() == ContainerState.COMPLETE) 
> {
>  288 ContainerStatus status =
>  289 ContainerStatus.newInstance(containerStatus.getContainerId(),
>  290   containerStatus.getContainerState(), 
> containerStatus.getDiagnostics(),
>  291   containerStatus.getContainerExitStatus());
>  292 // sending master container finished event.
>  293 RMAppAttemptContainerFinishedEvent evt =
>  294 new RMAppAttemptContainerFinishedEvent(appAttemptId, status,
>  295 nodeId);
>  296 rmContext.getDispatcher().getEventHandler().handle(evt);
>  297   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4348) ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding blocking ZK's event thread

2015-12-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047125#comment-15047125
 ] 

Hudson commented on YARN-4348:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #675 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/675/])
Update CHANGES.txt for commit of YARN-4348 to branch-2.7 and branch-2.6. 
(ozawa: rev d7b3f8dbe818cff5fee4f4c0c70d306776aa318e)
* hadoop-yarn-project/CHANGES.txt


> ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding 
> blocking ZK's event thread
> --
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Fix For: 2.6.3, 2.7.3
>
> Attachments: YARN-4348-branch-2.7.002.patch, 
> YARN-4348-branch-2.7.003.patch, YARN-4348-branch-2.7.004.patch, 
> YARN-4348.001.patch, YARN-4348.001.patch, log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version

2015-12-08 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047084#comment-15047084
 ] 

Junping Du commented on YARN-3623:
--

I don't quite familiar with requirement of ATS v1.5. However, in stands of ATS 
v2, I would agree with [~sjlee0]'s comments above to make this effect works on 
writer side only (TimelineClient). 
More clarifications:
1. This version configuration is to benefit application/framework to have 
flexibility to run on top of YARN cluster with ATS v1 or v2 running with 
indicating the latest stable version ATS service that the cluster can support.  
ATS v1 and v2 client are different binary bits and use different incompatible 
APIs to put information like event, metrics, etc. so far. With getting proper 
configuration from YARN, the application can aware the ATS service version when 
landing on YARN cluster and can choose different TimelineClient to push info 
and get rid of our pains in doing TestDistributedCache for v1/v2 timeline 
service.

2. We shouldn't break rolling upgrade scenario, or it could be seen as 
incompatible feature which cannot land on 2.x branch. That also means, we 
should support ATS v1 and v2 services at the same time during cluster upgrade 
so legacy/existing applications can still access their old ATS service which is 
the same as many rollup stories. 

2 clarification is more related to this change: we'd better change 
"yarn.timeline-service.version" to "yarn.timeline-service.latest.version" and 
use "indicate to clients what is the latest stable version of the running 
timeline service" to get rid of any confusion here. Also it is better to 
explicitly mention that our support range for ATS is: [X-1, X] for rolling 
upgrade (assume X is latest stable ATS version).


> We should have a config to indicate the Timeline Service version
> 
>
> Key: YARN-3623
> URL: https://issues.apache.org/jira/browse/YARN-3623
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Attachments: YARN-3623-2015-11-19.1.patch
>
>
> So far RM, MR AM, DA AM added/changed new config to enable the feature to 
> write the timeline data to v2 server. It's good to have a YARN 
> timeline-service.version config like timeline-service.enable to indicate the 
> version of the running timeline service with the given YARN cluster. It's 
> beneficial for users to more smoothly move from v1 to v2, as they don't need 
> to change the existing config, but switch this config from v1 to v2. And each 
> framework doesn't need to have their own v1/v2 config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1856) cgroups based memory monitoring for containers

2015-12-08 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047082#comment-15047082
 ] 

Sidharta Seethana commented on YARN-1856:
-

[~vvasudev] , there is still an issue with the handling of the soft limit 
percentage. Isn't there a divide by 100 missing? 

{code}
long softLimit =
(long) (container.getResource().getMemory() * softLimitPerc);
{code}

The test code below needs to be updated too - instead of specifying the value 
of the soft limit percentage here in test code, maybe we should use 
DEFAULT_NM_MEMORY_RESOURCE_CGROUPS_SOFT_LIMIT_PERC ? It also looks like the 
validation of the memory value is not happening correctly below. You could use 
Mockito's {{eq()}} to verify argument values. 

{code}
  verify(mockCGroupsHandler, times(1))
.updateCGroupParam(CGroupsHandler.CGroupController.MEMORY, id,
CGroupsHandler.CGROUP_PARAM_MEMORY_SOFT_LIMIT_BYTES,
String.valueOf((int) (memory * 0.9)) + "M");
{code}

> cgroups based memory monitoring for containers
> --
>
> Key: YARN-1856
> URL: https://issues.apache.org/jira/browse/YARN-1856
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.3.0
>Reporter: Karthik Kambatla
>Assignee: Varun Vasudev
> Attachments: YARN-1856.001.patch, YARN-1856.002.patch, 
> YARN-1856.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4309) Add debug information to application logs when a container fails

2015-12-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047066#comment-15047066
 ] 

Hadoop QA commented on YARN-4309:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
59s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 2s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 18s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 32s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
55s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 30s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 49s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 4s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 4s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 19s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 19s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 28s 
{color} | {color:red} Patch generated 3 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 359, now 359). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
39s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 29s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 49s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 1s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 48s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_85. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 16s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_85. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 17s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_85. {color} |
| {col

[jira] [Commented] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off

2015-12-08 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047055#comment-15047055
 ] 

Sangjin Lee commented on YARN-4356:
---

The jenkins build didn't fire automatically. Kicking off a manual build.

> ensure the timeline service v.2 is disabled cleanly and has no impact when 
> it's turned off
> --
>
> Key: YARN-4356
> URL: https://issues.apache.org/jira/browse/YARN-4356
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4356-feature-YARN-2928.002.patch, 
> YARN-4356-feature-YARN-2928.poc.001.patch
>
>
> For us to be able to merge the first milestone drop to trunk, we want to 
> ensure that once disabled the timeline service v.2 has no impact from the 
> server side to the client side. If the timeline service is not enabled, no 
> action should be done. If v.1 is enabled but not v.2, v.1 should behave the 
> same as it does before the merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4350) TestDistributedShell fails

2015-12-08 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047052#comment-15047052
 ] 

Sangjin Lee commented on YARN-4350:
---

[~Naganarasimha]:
{quote}
I am not sure how to proceed with this jira, as its introduced by YARN-2859 but 
actual cause is YARN-4372. YARN-4372 not sure we have any definitive solution. 
as temporary fix shall i revert YARN-2859 solution and have my solution so that 
we can proceed smoothly till YARN-4372 has some proper solution?
{quote}

I agree. Can we go back to before YARN-2859 and restore this unit test for the 
time being? It's not clear if YARN-4372 has a quick solution.

> TestDistributedShell fails
> --
>
> Key: YARN-4350
> URL: https://issues.apache.org/jira/browse/YARN-4350
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-4350-feature-YARN-2928.001.patch
>
>
> Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. 
> There seem to be 2 distinct issues.
> (1) testDSShellWithoutDomainV2* tests fail sporadically
> These test fail more often than not if tested by themselves:
> {noformat}
> testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 30.998 sec  <<< FAILURE!
> java.lang.AssertionError: Application created event should be published 
> atleast once expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207)
> {noformat}
> They start happening after YARN-4129. I suspect this might have to do with 
> some timing issue.
> (2) the whole test times out
> If you run the whole TestDistributedShell test, it times out without fail. 
> This may or may not have to do with the port change introduced by YARN-2859 
> (just a hunch).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4403) (AM/NM/Container)LivelinessMonitor should use monotonic time when calculating period

2015-12-08 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046995#comment-15046995
 ] 

Sunil G commented on YARN-4403:
---

Yes. That's definitely a valid reason as per the current usage. Thank you very 
much for clarifying.

> (AM/NM/Container)LivelinessMonitor should use monotonic time when calculating 
> period
> 
>
> Key: YARN-4403
> URL: https://issues.apache.org/jira/browse/YARN-4403
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4403-v2.patch, YARN-4403.patch
>
>
> Currently, (AM/NM/Container)LivelinessMonitor use current system time to 
> calculate a duration of expire which could be broken by settimeofday. We 
> should use Time.monotonicNow() instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI

2015-12-08 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046994#comment-15046994
 ] 

Sunil G commented on YARN-4413:
---

Hi [~templedf]
Thank you for the updated patch.

I have some doubts on the updated patch. I am not very sure about the move from 
DECOMMISSIONED to SHUTDOWN on RECOMMISSION event. Event doesnt sounds so clean 
or correct. Why could we not send SHUTDOWN event itself. I see no harm in doing 
that.
Because after refresh, a node is found to be in valid state as per config but 
DECOMMISSIONED by RM. So such nodes can be moved via SHUTDOWN event. Please 
correct me if I am missing something here.

> Nodes in the includes list should not be listed as decommissioned in the UI
> ---
>
> Key: YARN-4413
> URL: https://issues.apache.org/jira/browse/YARN-4413
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
> Attachments: YARN-4413.001.patch
>
>
> If I decommission a node and then move it from the excludes list back to the 
> includes list, but I don't restart the node, the node will still be listed by 
> the web UI as decomissioned until either the NM or RM is restarted.  Ideally, 
> removing the node from the excludes list and putting it back into the 
> includes list should cause the node to be reported as shutdown instead.
> CC [~kshukla]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (YARN-4348) ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding blocking ZK's event thread

2015-12-08 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4348:
-
Comment: was deleted

(was: Sounds good. Thanks!)

> ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding 
> blocking ZK's event thread
> --
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Fix For: 2.6.3, 2.7.3
>
> Attachments: YARN-4348-branch-2.7.002.patch, 
> YARN-4348-branch-2.7.003.patch, YARN-4348-branch-2.7.004.patch, 
> YARN-4348.001.patch, YARN-4348.001.patch, log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4348) ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding blocking ZK's event thread

2015-12-08 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046991#comment-15046991
 ] 

Junping Du commented on YARN-4348:
--

Sounds good. Thanks!

> ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding 
> blocking ZK's event thread
> --
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Fix For: 2.6.3, 2.7.3
>
> Attachments: YARN-4348-branch-2.7.002.patch, 
> YARN-4348-branch-2.7.003.patch, YARN-4348-branch-2.7.004.patch, 
> YARN-4348.001.patch, YARN-4348.001.patch, log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4348) ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding blocking ZK's event thread

2015-12-08 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046990#comment-15046990
 ] 

Junping Du commented on YARN-4348:
--

Sounds good. Thanks!

> ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding 
> blocking ZK's event thread
> --
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Fix For: 2.6.3, 2.7.3
>
> Attachments: YARN-4348-branch-2.7.002.patch, 
> YARN-4348-branch-2.7.003.patch, YARN-4348-branch-2.7.004.patch, 
> YARN-4348.001.patch, YARN-4348.001.patch, log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4403) (AM/NM/Container)LivelinessMonitor should use monotonic time when calculating period

2015-12-08 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046989#comment-15046989
 ] 

Junping Du commented on YARN-4403:
--

It depends on if any consumer of SystemClock are using it to track absolute 
time but not for duration or interval. I didn't check other calling places in 
YARN/MR, also theoretically, it could be consumers outside of YARN given this a 
public API. 
We may consider to mark this API as deprecated later if we check all known 
calling places are for duration or interval only. But for now, it could be 
better to keep annotation no changed but with NOTE.

> (AM/NM/Container)LivelinessMonitor should use monotonic time when calculating 
> period
> 
>
> Key: YARN-4403
> URL: https://issues.apache.org/jira/browse/YARN-4403
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4403-v2.patch, YARN-4403.patch
>
>
> Currently, (AM/NM/Container)LivelinessMonitor use current system time to 
> calculate a duration of expire which could be broken by settimeofday. We 
> should use Time.monotonicNow() instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4309) Add debug information to application logs when a container fails

2015-12-08 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-4309:

Attachment: YARN-4309.009.patch

Thanks for the reviews [~ivanmi] and [~leftnoteasy].

bq. I think it would be helpful to document what the methods are supposed to do.

Fixed.

bq. Do we want to remove the error check above, to be consistent with Linux, 
and to avoid failing due to a "logging" failure? Also, cp command does not 
exist on Windows. Please use copy instead. 

Fixed.

bq.  Why do you have both "dir" and "dir /AL /S" on Windows? Can you please 
include an inline comment with rationale.

The original intent was to try to detect broken symlinks but I'm not sure if 
that's possible using the dir command. I've removed the dir /AL /S command.

bq. In copyDebugInformation() you are also doing a chmod() internally. 
Wondering this this command should be injected by the call site given that only 
the caller has context on what the destination is and whether special 
permission handling is needed. It might be possible to change the method to 
only accept src and copy the file to the current folder, in which case it might 
be fine to use chmod() given that there is an assumption on what the current 
folder is. Just a thought you make the call.

Good point. Copying the file to the current folder doesn't work because the 
container launch script runs in the container work dir and we want these files 
to be uploaded as part of log aggregation. I've just added a check to make sure 
the path is absolute before attempting the chmod.

bq. I meant to print comment to the generated container_launch.sh for better 
readability. Such as:

Fixed.

> Add debug information to application logs when a container fails
> 
>
> Key: YARN-4309
> URL: https://issues.apache.org/jira/browse/YARN-4309
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4309.001.patch, YARN-4309.002.patch, 
> YARN-4309.003.patch, YARN-4309.004.patch, YARN-4309.005.patch, 
> YARN-4309.006.patch, YARN-4309.007.patch, YARN-4309.008.patch, 
> YARN-4309.009.patch
>
>
> Sometimes when a container fails, it can be pretty hard to figure out why it 
> failed.
> My proposal is that if a container fails, we collect information about the 
> container local dir and dump it into the container log dir. Ideally, I'd like 
> to tar up the directory entirely, but I'm not sure of the security and space 
> implications of such a approach. At the very least, we can list all the files 
> in the container local dir, and dump the contents of launch_container.sh(into 
> the container log dir).
> When log aggregation occurs, all this information will automatically get 
> collected and make debugging such failures much easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4403) (AM/NM/Container)LivelinessMonitor should use monotonic time when calculating period

2015-12-08 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046961#comment-15046961
 ] 

Sunil G commented on YARN-4403:
---

Hi [~djp]
Thanks for the updated patch. I have one doubt here, we could see tat 
{{SystemClock#getTime}} is *Public* and *Stable*. Now there is a note saying 
that its advisable to use {{MonotonicClock}}, so any annotation change is 
needed for {{SystemClock}}?

> (AM/NM/Container)LivelinessMonitor should use monotonic time when calculating 
> period
> 
>
> Key: YARN-4403
> URL: https://issues.apache.org/jira/browse/YARN-4403
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4403-v2.patch, YARN-4403.patch
>
>
> Currently, (AM/NM/Container)LivelinessMonitor use current system time to 
> calculate a duration of expire which could be broken by settimeofday. We 
> should use Time.monotonicNow() instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4348) ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding blocking ZK's event thread

2015-12-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046959#comment-15046959
 ] 

Hudson commented on YARN-4348:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8938 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8938/])
Update CHANGES.txt for commit of YARN-4348 to branch-2.7 and branch-2.6. 
(ozawa: rev d7b3f8dbe818cff5fee4f4c0c70d306776aa318e)
* hadoop-yarn-project/CHANGES.txt


> ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding 
> blocking ZK's event thread
> --
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Fix For: 2.6.3, 2.7.3
>
> Attachments: YARN-4348-branch-2.7.002.patch, 
> YARN-4348-branch-2.7.003.patch, YARN-4348-branch-2.7.004.patch, 
> YARN-4348.001.patch, YARN-4348.001.patch, log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4386) refreshNodesGracefully() looks at active RMNode list for recommissioning decommissioned nodes

2015-12-08 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046954#comment-15046954
 ] 

Sunil G commented on YARN-4386:
---

Hi [~kshukla]
Sorry for replying late here. 
bq. Unless there are 2 refreshNodes done in parallel such that the first 
deactivateNodeTransition has not finished and the other refreshNodes is also 
trying to do the same transition
Since the transitions are happening under write lock, this may not happen.

I have one suggestion here.
I feel You could mark a node for GRACEFUL DECOMMISSION and ensure that node is 
in DECOMMISSIONING state. (can try to fire event to RMNodeImpl directly to do 
this). Later invoke {{refreshNodesGracefully}} and verify that an event named 
RECOMMISSION is raised to dispatcher or not. Similarly mark a node as 
DECOMMISSIONED and then  invoke {{refreshNodesGracefully}} and verify the event 
RECOMMISSION is *NOT* raised. In second case, it will not enter *for* loop. but 
I feel this will clear cover our case here though its not direct.
Pls correct me if I am wrong.

> refreshNodesGracefully() looks at active RMNode list for recommissioning 
> decommissioned nodes
> -
>
> Key: YARN-4386
> URL: https://issues.apache.org/jira/browse/YARN-4386
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: graceful
>Affects Versions: 3.0.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Minor
> Attachments: YARN-4386-v1.patch
>
>
> In refreshNodesGracefully(), during recommissioning, the entryset from 
> getRMNodes() which has only active nodes (RUNNING, DECOMMISSIONING etc.) is 
> used for checking 'decommissioned' nodes which are present in 
> getInactiveRMNodes() map alone. 
> {code}
> for (Entry entry:rmContext.getRMNodes().entrySet()) { 
> .
>  // Recommissioning the nodes
> if (entry.getValue().getState() == NodeState.DECOMMISSIONING
> || entry.getValue().getState() == NodeState.DECOMMISSIONED) {
>   this.rmContext.getDispatcher().getEventHandler()
>   .handle(new RMNodeEvent(nodeId, RMNodeEventType.RECOMMISSION));
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4348) ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding blocking ZK's event thread

2015-12-08 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046951#comment-15046951
 ] 

Tsuyoshi Ozawa commented on YARN-4348:
--

Now I committed this to branch-2.6.3 too. Thanks!

> ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding 
> blocking ZK's event thread
> --
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Fix For: 2.6.3, 2.7.3
>
> Attachments: YARN-4348-branch-2.7.002.patch, 
> YARN-4348-branch-2.7.003.patch, YARN-4348-branch-2.7.004.patch, 
> YARN-4348.001.patch, YARN-4348.001.patch, log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4348) ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding blocking ZK's event thread

2015-12-08 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046944#comment-15046944
 ] 

Tsuyoshi Ozawa commented on YARN-4348:
--

Ran tests locally and pass tests on branch-2.6. Committing this to branch-2.6.

> ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding 
> blocking ZK's event thread
> --
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Fix For: 2.6.3, 2.7.3
>
> Attachments: YARN-4348-branch-2.7.002.patch, 
> YARN-4348-branch-2.7.003.patch, YARN-4348-branch-2.7.004.patch, 
> YARN-4348.001.patch, YARN-4348.001.patch, log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4348) ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding blocking ZK's event thread

2015-12-08 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046950#comment-15046950
 ] 

Tsuyoshi Ozawa commented on YARN-4348:
--

[~djp] I committed this to branch-2.6, which is targeting 2.6.3. Can I push 
this to branch-2.6.3?

> ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding 
> blocking ZK's event thread
> --
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Fix For: 2.6.3, 2.7.3
>
> Attachments: YARN-4348-branch-2.7.002.patch, 
> YARN-4348-branch-2.7.003.patch, YARN-4348-branch-2.7.004.patch, 
> YARN-4348.001.patch, YARN-4348.001.patch, log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4381) Add container launchEvent and container localizeFailed metrics in container

2015-12-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046933#comment-15046933
 ] 

Hadoop QA commented on YARN-4381:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
11s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 12s 
{color} | {color:red} Patch generated 4 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 (total was 116, now 120). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 9s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 59s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 27s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_85. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 20s 
{color} | {color:red} Patch generated 3 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 36m 3s {color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12776320/YARN-4381.002.patch |
| JIRA Issue | YARN-4381 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 767fc930d7b8 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit

[jira] [Updated] (YARN-4403) (AM/NM/Container)LivelinessMonitor should use monotonic time when calculating period

2015-12-08 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4403:
-
Attachment: YARN-4403-v2.patch

Update patch with incorporate Jian's comments.

> (AM/NM/Container)LivelinessMonitor should use monotonic time when calculating 
> period
> 
>
> Key: YARN-4403
> URL: https://issues.apache.org/jira/browse/YARN-4403
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4403-v2.patch, YARN-4403.patch
>
>
> Currently, (AM/NM/Container)LivelinessMonitor use current system time to 
> calculate a duration of expire which could be broken by settimeofday. We 
> should use Time.monotonicNow() instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4381) Add container launchEvent and container localizeFailed metrics in container

2015-12-08 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated YARN-4381:

Attachment: YARN-4381.002.patch

Thanks [~djp] for review. I update the container metrics 
more fine-grained. As you said that the container failed is not only because 
localizationFailed and is not suitable to add the metric on launchEvent. So I 
add the metric {{containerLaunchedSuccess}} when container is becoming to 
running state and seting the {{wasLaunched=true}}. Besides this, I add the 
another two metric2 for container-failed cases.
* one is for containerFailedBeforeLaunched
* other one is for containerKilledAfterLaunched
And I think these metrics will help us to know more concretely of a container.

> Add container launchEvent and container localizeFailed metrics in container
> ---
>
> Key: YARN-4381
> URL: https://issues.apache.org/jira/browse/YARN-4381
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4381.001.patch, YARN-4381.002.patch
>
>
> Recently, I found a issue on nodemanager metrics.That's 
> {{NodeManagerMetrics#containersLaunched}} is not actually means the container 
> succeed launched times.Because in some time, it will be failed when receiving 
> the killing command or happening container-localizationFailed.This will lead 
> to a failed container.But now,this counter value will be increased in these 
> code whenever the container is started successfully or failed.
> {code}
> Credentials credentials = parseCredentials(launchContext);
> Container container =
> new ContainerImpl(getConfig(), this.dispatcher,
> context.getNMStateStore(), launchContext,
>   credentials, metrics, containerTokenIdentifier);
> ApplicationId applicationID =
> containerId.getApplicationAttemptId().getApplicationId();
> if (context.getContainers().putIfAbsent(containerId, container) != null) {
>   NMAuditLogger.logFailure(user, AuditConstants.START_CONTAINER,
> "ContainerManagerImpl", "Container already running on this node!",
> applicationID, containerId);
>   throw RPCUtil.getRemoteException("Container " + containerIdStr
>   + " already is running on this node!!");
> }
> this.readLock.lock();
> try {
>   if (!serviceStopped) {
> // Create the application
> Application application =
> new ApplicationImpl(dispatcher, user, applicationID, credentials, 
> context);
> if (null == context.getApplications().putIfAbsent(applicationID,
>   application)) {
>   LOG.info("Creating a new application reference for app " + 
> applicationID);
>   LogAggregationContext logAggregationContext =
>   containerTokenIdentifier.getLogAggregationContext();
>   Map appAcls =
>   container.getLaunchContext().getApplicationACLs();
>   context.getNMStateStore().storeApplication(applicationID,
>   buildAppProto(applicationID, user, credentials, appAcls,
> logAggregationContext));
>   dispatcher.getEventHandler().handle(
> new ApplicationInitEvent(applicationID, appAcls,
>   logAggregationContext));
> }
> this.context.getNMStateStore().storeContainer(containerId, request);
> dispatcher.getEventHandler().handle(
>   new ApplicationContainerInitEvent(container));
> 
> this.context.getContainerTokenSecretManager().startContainerSuccessful(
>   containerTokenIdentifier);
> NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER,
>   "ContainerManageImpl", applicationID, containerId);
> // TODO launchedContainer misplaced -> doesn't necessarily mean a 
> container
> // launch. A finished Application will not launch containers.
> metrics.launchedContainer();
> metrics.allocateContainer(containerTokenIdentifier.getResource());
>   } else {
> throw new YarnException(
> "Container start failed as the NodeManager is " +
> "in the process of shutting down");
>   }
> {code}
> In addition, we are lack of localzationFailed metric in container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (YARN-4403) (AM/NM/Container)LivelinessMonitor should use monotonic time when calculating period

2015-12-08 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4403:
-
Comment: was deleted

(was: Ok. That sounds good. Will update the patch soon.)

> (AM/NM/Container)LivelinessMonitor should use monotonic time when calculating 
> period
> 
>
> Key: YARN-4403
> URL: https://issues.apache.org/jira/browse/YARN-4403
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4403.patch
>
>
> Currently, (AM/NM/Container)LivelinessMonitor use current system time to 
> calculate a duration of expire which could be broken by settimeofday. We 
> should use Time.monotonicNow() instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (YARN-4403) (AM/NM/Container)LivelinessMonitor should use monotonic time when calculating period

2015-12-08 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4403:
-
Comment: was deleted

(was: Ok. That sounds good. Will update the patch soon.)

> (AM/NM/Container)LivelinessMonitor should use monotonic time when calculating 
> period
> 
>
> Key: YARN-4403
> URL: https://issues.apache.org/jira/browse/YARN-4403
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4403.patch
>
>
> Currently, (AM/NM/Container)LivelinessMonitor use current system time to 
> calculate a duration of expire which could be broken by settimeofday. We 
> should use Time.monotonicNow() instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4403) (AM/NM/Container)LivelinessMonitor should use monotonic time when calculating period

2015-12-08 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046873#comment-15046873
 ] 

Junping Du commented on YARN-4403:
--

Ok. That sounds good. Will update the patch soon.

> (AM/NM/Container)LivelinessMonitor should use monotonic time when calculating 
> period
> 
>
> Key: YARN-4403
> URL: https://issues.apache.org/jira/browse/YARN-4403
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4403.patch
>
>
> Currently, (AM/NM/Container)LivelinessMonitor use current system time to 
> calculate a duration of expire which could be broken by settimeofday. We 
> should use Time.monotonicNow() instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4403) (AM/NM/Container)LivelinessMonitor should use monotonic time when calculating period

2015-12-08 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046871#comment-15046871
 ] 

Junping Du commented on YARN-4403:
--

Ok. That sounds good. Will update the patch soon.

> (AM/NM/Container)LivelinessMonitor should use monotonic time when calculating 
> period
> 
>
> Key: YARN-4403
> URL: https://issues.apache.org/jira/browse/YARN-4403
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4403.patch
>
>
> Currently, (AM/NM/Container)LivelinessMonitor use current system time to 
> calculate a duration of expire which could be broken by settimeofday. We 
> should use Time.monotonicNow() instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4403) (AM/NM/Container)LivelinessMonitor should use monotonic time when calculating period

2015-12-08 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046872#comment-15046872
 ] 

Junping Du commented on YARN-4403:
--

Ok. That sounds good. Will update the patch soon.

> (AM/NM/Container)LivelinessMonitor should use monotonic time when calculating 
> period
> 
>
> Key: YARN-4403
> URL: https://issues.apache.org/jira/browse/YARN-4403
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4403.patch
>
>
> Currently, (AM/NM/Container)LivelinessMonitor use current system time to 
> calculate a duration of expire which could be broken by settimeofday. We 
> should use Time.monotonicNow() instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >