date:20150423


 [ 
https://issues.apache.org/jira/browse/YARN-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran moved HADOOP-11826 to YARN-3539:
---

  Component/s: (was: documentation)
   documentation
Affects Version/s: (was: 2.7.0)
   2.7.0
  Key: YARN-3539  (was: HADOOP-11826)
  Project: Hadoop YARN  (was: Hadoop Common)

 Compatibility doc to state that ATS v1 is a stable REST API
 ---

 Key: YARN-3539
 URL: https://issues.apache.org/jira/browse/YARN-3539
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.7.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: HADOOP-11826-001.patch, HADOOP-11826-002.patch


 The ATS v2 discussion and YARN-2423 have raised the question: how stable are 
 the ATSv1 APIs?
 The existing compatibility document actually states that the History Server 
 is [a stable REST 
 API|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#REST_APIs],
  which effectively means that ATSv1 has already been declared as a stable API.
 Clarify this by patching the compatibility document appropriately



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3480) Make AM max attempts stored in RMAppImpl and RMStateStore to be configurable

2015-04-23 Thread Jun Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong updated YARN-3480:
---
Summary: Make AM max attempts stored in RMAppImpl and RMStateStore to be 
configurable  (was: Make AM max attempts stored in RMStateStore to be 
configurable)

 Make AM max attempts stored in RMAppImpl and RMStateStore to be configurable
 

 Key: YARN-3480
 URL: https://issues.apache.org/jira/browse/YARN-3480
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-3480.01.patch


 When RM HA is enabled and running containers are kept across attempts, apps 
 are more likely to finish successfully with more retries(attempts), so it 
 will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However 
 it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make 
 RM recover process much slower. It might be better to set max attempts to be 
 stored in RMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED

2015-04-23 Thread Peng Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509256#comment-14509256
 ] 

Peng Zhang commented on YARN-3535:
--

As per [~jlowe]'s thoughts, I understand here are two separated thing:
# During NM reconnection, RM and NM should do sync at container level. For this 
issue's scenario, container 04 should not be killed and rescheduled, so AM 
can acquire and launch it  on NM after NM registered.
# Still need fix in RMContainerImpl: restore request during transition from  
ALLOCATED to KILLED. Because NM's real lost may cause transition from ALLOCATED 
to KILLED with very small possibility(AM may heartbeat and acquire container 
after NM heartbeats timeout).

I think first thing is an improvement to save time or scheduling work done 
before. Or did I get any mistake? 

  ResourceRequest should be restored back to scheduler when RMContainer is 
 killed at ALLOCATED
 -

 Key: YARN-3535
 URL: https://issues.apache.org/jira/browse/YARN-3535
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Peng Zhang
Assignee: Peng Zhang
 Attachments: syslog.tgz, yarn-app.log


 During rolling update of NM, AM start of container on NM failed. 
 And then job hang there.
 Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3480) Make AM max attempts stored in RMStateStore to be configurable

2015-04-23 Thread Jun Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong updated YARN-3480:
---
Attachment: YARN-3480.01.patch

Attach an initial patch. I will add test cases later.

 Make AM max attempts stored in RMStateStore to be configurable
 --

 Key: YARN-3480
 URL: https://issues.apache.org/jira/browse/YARN-3480
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-3480.01.patch


 When RM HA is enabled and running containers are kept across attempts, apps 
 are more likely to finish successfully with more retries(attempts), so it 
 will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However 
 it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make 
 RM recover process much slower. It might be better to set max attempts to be 
 stored in RMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3538) TimelineServer doesn't catch/translate all exceptions raised

Steve Loughran created YARN-3538:


 Summary: TimelineServer doesn't catch/translate all exceptions 
raised
 Key: YARN-3538
 URL: https://issues.apache.org/jira/browse/YARN-3538
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Steve Loughran
Priority: Minor


Not all exceptions in TimelineServer are uprated to web exceptions; only IOEs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3537) NPE when NodeManager.serviceInit fails and stopRecoveryStore invoked


[ 
https://issues.apache.org/jira/browse/YARN-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509158#comment-14509158
 ] 

Jason Lowe commented on YARN-3537:
--

The code checks for a null store to avoid invoking the stop method, but then a 
few lines later it has the potential to invoke the canRecover method.  Seems 
like we want to avoid doing anything at all in this method if the store is null.

 NPE when NodeManager.serviceInit fails and stopRecoveryStore invoked
 

 Key: YARN-3537
 URL: https://issues.apache.org/jira/browse/YARN-3537
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
 Attachments: YARN-3537.patch


 2015-04-23 19:30:34,961 INFO  [main] service.AbstractService 
 (AbstractService.java:noteFailure(272)) - Service NodeManager failed in state 
 STOPPED; cause: java.lang.NullPointerException
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.stopRecoveryStore(NodeManager.java:181)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:326)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.tearDown(TestNodeManagerShutdown.java:106)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED

[
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509153#comment-14509153
]

Jason Lowe commented on YARN-3535:
--

I think we need to fix the RMContainerImpl ALLOCATED to KILLED transition, but
I think there's another bug here. I believe the container was killed in the
first place because the RMNodeImpl reconnect transition makes an assumption
that is racy. When the node reconnects, it checks if the node reports no
applications running. If it has no applications then it sends a removed node
eventfollowed by a added node event to the scheduler. This will cause the
scheduler to kill all containers allocated on that node. However the node will
only know about a container iff the AM acquires the container and tries to
launch the container on the node. That can take minutes to transpire, so it's
dangerous to assume that a node not reporting any applications on the node
means it doesn't have anything pending.

I think we'll have to revisit the solution to YARN-2561 to either eliminate
this race or make it safe if it does occur. Ideally we shouldn't be sending a
remove/add event to the scheduler if the node is reconnecting, but we need to
make sure we cancel containers on the node that are no longer running. Since
the node reports what containers it has when it reconnects, it seems like we
can convey that information to the scheduler to correct anything that doesn't
match up. Any container in the RUNNING state that no longer appears in the
list of containers when registering can be killed by the scheduler, as it does
when a node is removed, and I believe that will fix YARN-2561 and also avoid
this race.

cc: [~djp] as this also has potential ramifications for graceful decommission.
If we try to graceful decommission a node that isn't currently reporting
applications we may also need to verify the scheduler hasn't allocated or
handed out a container for that node that hasn't reached the node yet.

ResourceRequest should be restored back to scheduler when RMContainer is
killed at ALLOCATED
-

Key: YARN-3535
URL: https://issues.apache.org/jira/browse/YARN-3535
Project: Hadoop YARN
Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Peng Zhang
Assignee: Peng Zhang
Attachments: syslog.tgz, yarn-app.log

During rolling update of NM, AM start of container on NM failed.
And then job hang there.
Attach AM logs.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only

2015-04-23 Thread Thomas Graves (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509152#comment-14509152
 ] 

Thomas Graves commented on YARN-3517:
-


+  // non-secure mode with no acls enabled
+  if (!isAdmin  !UserGroupInformation.isSecurityEnabled()
+   !adminACLsManager.areACLsEnabled()) {
+isAdmin = true;
+  }
+

We don't need the isSecurityEnabled check,  just keep the one for 
areAclsEnabled. This could be combined with the previous if, make this the else 
if part but that isn't a big deal.

in QueuesBlock we are creating the AdminACLsManager every web page load.   
Perhaps a better way would be to use the this.rm.getApplicationACLsManager() 
and extend the ApplicationAclsManager to explose an isAdmin functionality

 RM web ui for dumping scheduler logs should be for admins only
 --

 Key: YARN-3517
 URL: https://issues.apache.org/jira/browse/YARN-3517
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, security
Affects Versions: 2.7.0
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Blocker
  Labels: security
 Attachments: YARN-3517.001.patch, YARN-3517.002.patch, 
 YARN-3517.003.patch


 YARN-3294 allows users to dump scheduler logs from the web UI. This should be 
 for admins only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3437) convert load test driver to timeline service v.2


[ 
https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509237#comment-14509237
 ] 

Sangjin Lee commented on YARN-3437:
---

Thanks Junping. I initially went with ConcurrentHashMap when I first created 
this as that is my preference as well. But it was really preventing multiple 
threads from starting their collector (should that situation arise) that made 
ConcurrentHashMap not an option. Again, if we want both, we would need to look 
at the LoadingCache. But since this is really a low contention situation, it 
would be an overkill. The chances of this code running into a lock contention 
should be low.

 convert load test driver to timeline service v.2
 

 Key: YARN-3437
 URL: https://issues.apache.org/jira/browse/YARN-3437
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: YARN-3437.001.patch, YARN-3437.002.patch, 
 YARN-3437.003.patch, YARN-3437.004.patch


 This subtask covers the work for converting the proposed patch for the load 
 test driver (YARN-2556) to work with the timeline service v.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3471) Fix timeline client retry


[ 
https://issues.apache.org/jira/browse/YARN-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509138#comment-14509138
 ] 

Steve Loughran commented on YARN-3471:
--

Looking at this patch, it doesn't address the issues I've encountered in 
YARN-3477

# when the retries time out, the exception causing the attempts to fail is not 
rethrown
# interrupts are being swallowed, making it impossible to reliably interrupt 
the thread

I'd like to get the YARN-3477 patches in as well as these ones, so lets see if 
we can do them in turn.

 Fix timeline client retry
 -

 Key: YARN-3471
 URL: https://issues.apache.org/jira/browse/YARN-3471
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.8.0
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3471.1.patch, YARN-3471.2.patch


 I found that the client retry has some problems:
 1. The new put methods will retry on all exception, but they should only do 
 it upon ConnectException.
 2. We can reuse TimelineClientConnectionRetry to simplify the retry logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3537) NPE when NodeManager.serviceInit fails and stopRecoveryStore invoked


[ 
https://issues.apache.org/jira/browse/YARN-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509208#comment-14509208
 ] 

Brahma Reddy Battula commented on YARN-3537:


[~jlowe] thanks for taking look into this issue..Yes, you are correct...Updated 
the patch,kindly review...

 NPE when NodeManager.serviceInit fails and stopRecoveryStore invoked
 

 Key: YARN-3537
 URL: https://issues.apache.org/jira/browse/YARN-3537
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
 Attachments: YARN-3537-002.patch, YARN-3537.patch


 2015-04-23 19:30:34,961 INFO  [main] service.AbstractService 
 (AbstractService.java:noteFailure(272)) - Service NodeManager failed in state 
 STOPPED; cause: java.lang.NullPointerException
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.stopRecoveryStore(NodeManager.java:181)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:326)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.tearDown(TestNodeManagerShutdown.java:106)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation


[ 
https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509169#comment-14509169
 ] 

Hudson commented on YARN-3434:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7646 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7646/])
YARN-3434. Interaction between reservations and userlimit can result in 
significant ULF violation (tgraves: rev 
189a63a719c63b67a1783a280bfc2f72dcb55277)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceLimits.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java


 Interaction between reservations and userlimit can result in significant ULF 
 violation
 --

 Key: YARN-3434
 URL: https://issues.apache.org/jira/browse/YARN-3434
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, 
 YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch


 ULF was set to 1.0
 User was able to consume 1.4X queue capacity.
 It looks like when this application launched, it reserved about 1000 
 containers, each 8G each, within about 5 seconds. I think this allowed the 
 logic in assignToUser() to allow the userlimit to be surpassed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3537) NPE when NodeManager.serviceInit fails and stopRecoveryStore invoked


 [ 
https://issues.apache.org/jira/browse/YARN-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3537:
---
Attachment: YARN-3537-002.patch

 NPE when NodeManager.serviceInit fails and stopRecoveryStore invoked
 

 Key: YARN-3537
 URL: https://issues.apache.org/jira/browse/YARN-3537
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
 Attachments: YARN-3537-002.patch, YARN-3537.patch


 2015-04-23 19:30:34,961 INFO  [main] service.AbstractService 
 (AbstractService.java:noteFailure(272)) - Service NodeManager failed in state 
 STOPPED; cause: java.lang.NullPointerException
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.stopRecoveryStore(NodeManager.java:181)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:326)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.tearDown(TestNodeManagerShutdown.java:106)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3537) NPE when NodeManager.serviceInit fails and stopRecoveryStore invoked


[ 
https://issues.apache.org/jira/browse/YARN-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509288#comment-14509288
 ] 

Hadoop QA commented on YARN-3537:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 38s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 34s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 32s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   5m 19s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m  3s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   5m 56s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  46m 52s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12727625/YARN-3537-002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 189a63a |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7472/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7472/testReport/ |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7472/console |


This message was automatically generated.

 NPE when NodeManager.serviceInit fails and stopRecoveryStore invoked
 

 Key: YARN-3537
 URL: https://issues.apache.org/jira/browse/YARN-3537
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
 Attachments: YARN-3537-002.patch, YARN-3537.patch


 2015-04-23 19:30:34,961 INFO  [main] service.AbstractService 
 (AbstractService.java:noteFailure(272)) - Service NodeManager failed in state 
 STOPPED; cause: java.lang.NullPointerException
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.stopRecoveryStore(NodeManager.java:181)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:326)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.tearDown(TestNodeManagerShutdown.java:106)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3537) NPE when NodeManager.serviceInit fails and stopRecoveryStore invoked


[ 
https://issues.apache.org/jira/browse/YARN-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509303#comment-14509303
 ] 

Jason Lowe commented on YARN-3537:
--

If the store is not null then we want to close it regardless of whether the 
context is null or not, because that means we opened it earlier.  My previous 
point is that if we don't have a store then this method has nothing to do.  If 
the context is null but the store isn't then there _is_ something that still 
needs to be done.

 NPE when NodeManager.serviceInit fails and stopRecoveryStore invoked
 

 Key: YARN-3537
 URL: https://issues.apache.org/jira/browse/YARN-3537
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
 Attachments: YARN-3537-002.patch, YARN-3537.patch


 2015-04-23 19:30:34,961 INFO  [main] service.AbstractService 
 (AbstractService.java:noteFailure(272)) - Service NodeManager failed in state 
 STOPPED; cause: java.lang.NullPointerException
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.stopRecoveryStore(NodeManager.java:181)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:326)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.tearDown(TestNodeManagerShutdown.java:106)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3539) Compatibility doc to state that ATS v1 is a stable REST API


[ 
https://issues.apache.org/jira/browse/YARN-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509313#comment-14509313
 ] 

Hadoop QA commented on YARN-3539:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   2m 52s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | release audit |   0m 20s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | site |   2m 55s | Site still builds. |
| | |   6m 15s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12727643/YARN-3539-003.patch |
| Optional Tests | site |
| git revision | trunk / 189a63a |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7474/console |


This message was automatically generated.

 Compatibility doc to state that ATS v1 is a stable REST API
 ---

 Key: YARN-3539
 URL: https://issues.apache.org/jira/browse/YARN-3539
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.7.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: HADOOP-11826-001.patch, HADOOP-11826-002.patch, 
 YARN-3539-003.patch


 The ATS v2 discussion and YARN-2423 have raised the question: how stable are 
 the ATSv1 APIs?
 The existing compatibility document actually states that the History Server 
 is [a stable REST 
 API|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#REST_APIs],
  which effectively means that ATSv1 has already been declared as a stable API.
 Clarify this by patching the compatibility document appropriately



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2740) ResourceManager side should properly handle node label modifications when distributed node label configuration enabled

2015-04-23 Thread Naganarasimha G R (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-2740:

Attachment: YARN-2740.20150423-1.patch

Hi [~wangda],
Patch failed to apply on top of yest checkins, hence rebased and also have 
corrected the trailing white space.


 ResourceManager side should properly handle node label modifications when 
 distributed node label configuration enabled
 --

 Key: YARN-2740
 URL: https://issues.apache.org/jira/browse/YARN-2740
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Fix For: 2.8.0

 Attachments: YARN-2740-20141024-1.patch, YARN-2740.20150320-1.patch, 
 YARN-2740.20150327-1.patch, YARN-2740.20150411-1.patch, 
 YARN-2740.20150411-2.patch, YARN-2740.20150411-3.patch, 
 YARN-2740.20150417-1.patch, YARN-2740.20150420-1.patch, 
 YARN-2740.20150421-1.patch, YARN-2740.20150422-2.patch, 
 YARN-2740.20150423-1.patch


 According to YARN-2495, when distributed node label configuration is enabled:
 - RMAdmin / REST API should reject change labels on node operations.
 - CommonNodeLabelsManager shouldn't persist labels on nodes when NM do 
 heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3477) TimelineClientImpl swallows exceptions


 [ 
https://issues.apache.org/jira/browse/YARN-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-3477:
-
Attachment: YARN-3477-001.patch

Patch -001

# rethrows last received exception on a retry count failure
# caught InterruptedExceptions are converted to InterruptedIOException. This 
allows recipients to selectively look for that exception.
# no longer swallows InterruptedExceptions during sleep

There's no tests here, because there's no easy way to implement the failure 
paths.  Close review is encouraged.

There's one more thing we may want to do when handling the interrupts: 
re-enable the thread's interrupted flag. See 
[http://www.ibm.com/developerworks/library/j-jtp05236/] for the specifics here. 
I don't see any harm in doing this, and as it helps preserve the interrupted 
state, can only be a good thing

 TimelineClientImpl swallows exceptions
 --

 Key: YARN-3477
 URL: https://issues.apache.org/jira/browse/YARN-3477
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.6.0, 2.7.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-3477-001.patch


 If timeline client fails more than the retry count, the original exception is 
 not thrown. Instead some runtime exception is raised saying retries run out
 # the failing exception should be rethrown, ideally via 
 NetUtils.wrapException to include URL of the failing endpoing
 # Otherwise, the raised RTE should (a) state that URL and (b) set the 
 original fault as the inner cause



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3413) Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime


[ 
https://issues.apache.org/jira/browse/YARN-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509556#comment-14509556
 ] 

Hudson commented on YARN-3413:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7650 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7650/])
YARN-3413. Changed Nodelabel attributes (like exclusivity) to be settable only 
via addToClusterNodeLabels but not changeable at runtime. (Wangda Tan via 
vinodkv) (vinodkv: rev f5fe35e297ed4a00a1ba93d090207ef67cebcc9d)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/client/ResourceManagerAdministrationProtocolPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/event/StoreUpdateNodeLabelsEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/NullRMNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueParsing.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/resourcemanager_administration_protocol.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/event/NodeLabelsStoreEventType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/service/ResourceManagerAdministrationProtocolPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeLabel.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ClusterCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ResourceManagerAdministrationProtocol.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/TestRMNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestNodeLabelContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShellWithNodeLabels.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/NodeLabelsStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestCommonNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/DummyCommonNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/UpdateNodeLabelsRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetClusterNodeLabelsResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestWorkPreservingRMRestartForNodeLabel.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/RMNodeLabel.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
*

[jira] [Commented] (YARN-3537) NPE when NodeManager.serviceInit fails and stopRecoveryStore invoked


[ 
https://issues.apache.org/jira/browse/YARN-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509366#comment-14509366
 ] 

Brahma Reddy Battula commented on YARN-3537:


Thanks again for your input..Updated patch..

 NPE when NodeManager.serviceInit fails and stopRecoveryStore invoked
 

 Key: YARN-3537
 URL: https://issues.apache.org/jira/browse/YARN-3537
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
 Attachments: YARN-3537-002.patch, YARN-3537-003.patch, YARN-3537.patch


 2015-04-23 19:30:34,961 INFO  [main] service.AbstractService 
 (AbstractService.java:noteFailure(272)) - Service NodeManager failed in state 
 STOPPED; cause: java.lang.NullPointerException
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.stopRecoveryStore(NodeManager.java:181)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:326)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.tearDown(TestNodeManagerShutdown.java:106)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3537) NPE when NodeManager.serviceInit fails and stopRecoveryStore invoked


[ 
https://issues.apache.org/jira/browse/YARN-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509452#comment-14509452
 ] 

Hadoop QA commented on YARN-3537:
-

(!) The patch artifact directory on has been removed! 
This is a fatal error for test-patch.sh.  Aborting. 
Jenkins (node H8) information at 
https://builds.apache.org/job/PreCommit-YARN-Build/7477/ may provide some hints.

 NPE when NodeManager.serviceInit fails and stopRecoveryStore invoked
 

 Key: YARN-3537
 URL: https://issues.apache.org/jira/browse/YARN-3537
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
 Attachments: YARN-3537-002.patch, YARN-3537-003.patch, YARN-3537.patch


 2015-04-23 19:30:34,961 INFO  [main] service.AbstractService 
 (AbstractService.java:noteFailure(272)) - Service NodeManager failed in state 
 STOPPED; cause: java.lang.NullPointerException
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.stopRecoveryStore(NodeManager.java:181)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:326)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.tearDown(TestNodeManagerShutdown.java:106)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2740) ResourceManager side should properly handle node label modifications when distributed node label configuration enabled


[ 
https://issues.apache.org/jira/browse/YARN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509455#comment-14509455
 ] 

Wangda Tan commented on YARN-2740:
--

Seems like env issue, re-kicked Jenkins.

 ResourceManager side should properly handle node label modifications when 
 distributed node label configuration enabled
 --

 Key: YARN-2740
 URL: https://issues.apache.org/jira/browse/YARN-2740
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Fix For: 2.8.0

 Attachments: YARN-2740-20141024-1.patch, YARN-2740.20150320-1.patch, 
 YARN-2740.20150327-1.patch, YARN-2740.20150411-1.patch, 
 YARN-2740.20150411-2.patch, YARN-2740.20150411-3.patch, 
 YARN-2740.20150417-1.patch, YARN-2740.20150420-1.patch, 
 YARN-2740.20150421-1.patch, YARN-2740.20150422-2.patch, 
 YARN-2740.20150423-1.patch


 According to YARN-2495, when distributed node label configuration is enabled:
 - RMAdmin / REST API should reject change labels on node operations.
 - CommonNodeLabelsManager shouldn't persist labels on nodes when NM do 
 heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3413) Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime


[ 
https://issues.apache.org/jira/browse/YARN-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509497#comment-14509497
 ] 

Vinod Kumar Vavilapalli commented on YARN-3413:
---

I tried running test-patch.sh on my own box, but ran into issues w.r.t 
checkstyle. I'd think some of the checkstyle issues are the same as 
HADOOP-11869. Will create a ticket to address all of YARN checkstyle issues 
once HADOOP-11869 gets addressed.

Checking this in now.

 Node label attributes (like exclusivity) should settable via 
 addToClusterNodeLabels but shouldn't be changeable at runtime
 --

 Key: YARN-3413
 URL: https://issues.apache.org/jira/browse/YARN-3413
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3413.1.patch, YARN-3413.2.patch, YARN-3413.3.patch, 
 YARN-3413.4.patch, YARN-3413.5.patch, YARN-3413.6.patch, YARN-3413.7.patch


 As mentioned in : 
 https://issues.apache.org/jira/browse/YARN-3345?focusedCommentId=14384947page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14384947.
 Changing node label exclusivity and/or other attributes may not be a real use 
 case, and also we should support setting node label attributes whiling adding 
 them to cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3413) Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime


[ 
https://issues.apache.org/jira/browse/YARN-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509559#comment-14509559
 ] 

Wangda Tan commented on YARN-3413:
--

Thanks Vinod for review  commit!

 Node label attributes (like exclusivity) should settable via 
 addToClusterNodeLabels but shouldn't be changeable at runtime
 --

 Key: YARN-3413
 URL: https://issues.apache.org/jira/browse/YARN-3413
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3413.1.patch, YARN-3413.2.patch, YARN-3413.3.patch, 
 YARN-3413.4.patch, YARN-3413.5.patch, YARN-3413.6.patch, YARN-3413.7.patch


 As mentioned in : 
 https://issues.apache.org/jira/browse/YARN-3345?focusedCommentId=14384947page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14384947.
 Changing node label exclusivity and/or other attributes may not be a real use 
 case, and also we should support setting node label attributes whiling adding 
 them to cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2740) ResourceManager side should properly handle node label modifications when distributed node label configuration enabled


[ 
https://issues.apache.org/jira/browse/YARN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509451#comment-14509451
 ] 

Hadoop QA commented on YARN-2740:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 42s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 7 new or modified test files. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:red}-1{color} | javac |   7m 45s | The applied patch generated  122  
additional warning messages. |
| {color:red}-1{color} | javadoc |   9m 56s | The applied patch generated  9  
additional warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   5m 30s | The applied patch generated  3 
 additional checkstyle issues. |
| {color:green}+1{color} | install |   1m 37s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m  3s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   0m 25s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   2m  3s | Tests passed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |  19m 31s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  66m 32s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler |
|   | hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler |
|   | hadoop.yarn.server.resourcemanager.TestApplicationACLs |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs
 |
|   | hadoop.yarn.server.resourcemanager.TestApplicationCleanup |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSLeafQueue |
|   | hadoop.yarn.server.resourcemanager.TestRM |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes |
|   | hadoop.yarn.server.resourcemanager.TestResourceTrackerService |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification |
|   | hadoop.yarn.server.resourcemanager.security.TestAMRMTokens |
|   | hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerFairShare |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppAttempt |
|   | hadoop.yarn.server.resourcemanager.resourcetracker.TestNMExpiry |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerDynamicBehavior
 |
|   | hadoop.yarn.server.resourcemanager.TestMoveApplication |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched |
|   | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerEventLog |
|   | hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher |
|   | hadoop.yarn.server.resourcemanager.TestRMProxyUsersConf |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation |
|   | hadoop.yarn.server.resourcemanager.resourcetracker.TestNMReconnect |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesHttpStaticUserPermissions
 |
|   | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices |
|   | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer |
|   | hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler |
|   | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart |
|   | hadoop.yarn.server.resourcemanager.TestApplicationMasterService |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs |
|   | 
hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 |
|   | hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodeLabels |
|   | hadoop.yarn.server.resourcemanager.scheduler.TestQueueMetrics |
|   |

[jira] [Commented] (YARN-3319) Implement a FairOrderingPolicy


[ 
https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509470#comment-14509470
 ] 

Hudson commented on YARN-3319:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7648 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7648/])
YARN-3319. Implement a FairOrderingPolicy. (Craig Welch via wangda) (wangda: 
rev 395205444e8a9ae6fc86f0a441e98486a775511a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/TestFairOrderingPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/FairOrderingPolicy.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/CompoundComparator.java


 Implement a FairOrderingPolicy
 --

 Key: YARN-3319
 URL: https://issues.apache.org/jira/browse/YARN-3319
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Fix For: 2.8.0

 Attachments: YARN-3319.13.patch, YARN-3319.14.patch, 
 YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch, 
 YARN-3319.45.patch, YARN-3319.47.patch, YARN-3319.53.patch, 
 YARN-3319.58.patch, YARN-3319.70.patch, YARN-3319.71.patch, 
 YARN-3319.72.patch, YARN-3319.73.patch, YARN-3319.74.patch, YARN-3319.75.patch


 Implement a FairOrderingPolicy which prefers to allocate to 
 SchedulerProcesses with least current usage, very similar to the 
 FairScheduler's FairSharePolicy.  
 The Policy will offer allocations to applications in a queue in order of 
 least resources used, and preempt applications in reverse order (from most 
 resources used). This will include conditional support for sizeBasedWeight 
 style adjustment
 Optionally, based on a conditional configuration to enable sizeBasedWeight 
 (default false), an adjustment to boost larger applications (to offset the 
 natural preference for smaller applications) will adjust the resource usage 
 value based on demand, dividing it by the below value:
 Math.log1p(app memory demand) / Math.log(2);
 In cases where the above is indeterminate (two applications are equal after 
 this comparison), behavior falls back to comparison based on the application 
 id, which is generally lexically FIFO for that comparison



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only

2015-04-23 Thread Varun Vasudev (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-3517:

Affects Version/s: (was: 2.7.0)

 RM web ui for dumping scheduler logs should be for admins only
 --

 Key: YARN-3517
 URL: https://issues.apache.org/jira/browse/YARN-3517
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, security
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Blocker
  Labels: security
 Attachments: YARN-3517.001.patch, YARN-3517.002.patch, 
 YARN-3517.003.patch


 YARN-3294 allows users to dump scheduler logs from the web UI. This should be 
 for admins only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3477) TimelineClientImpl swallows exceptions


[ 
https://issues.apache.org/jira/browse/YARN-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509409#comment-14509409
 ] 

Hadoop QA commented on YARN-3477:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 35s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 31s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 36s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   5m 22s | The applied patch generated  1 
 additional checkstyle issues. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m  8s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   7m 32s | Tests passed in 
hadoop-yarn-client. |
| {color:red}-1{color} | yarn tests |   1m 53s | Tests failed in 
hadoop-yarn-common. |
| | |  51m 18s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.client.api.impl.TestTimelineClient |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12727644/YARN-3477-001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 49f6e3d |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/7475/artifact/patchprocess/checkstyle-result-diff.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7475/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7475/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7475/testReport/ |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7475/console |


This message was automatically generated.

 TimelineClientImpl swallows exceptions
 --

 Key: YARN-3477
 URL: https://issues.apache.org/jira/browse/YARN-3477
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.6.0, 2.7.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-3477-001.patch


 If timeline client fails more than the retry count, the original exception is 
 not thrown. Instead some runtime exception is raised saying retries run out
 # the failing exception should be rethrown, ideally via 
 NetUtils.wrapException to include URL of the failing endpoing
 # Otherwise, the raised RTE should (a) state that URL and (b) set the 
 original fault as the inner cause



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2740) ResourceManager side should properly handle node label modifications when distributed node label configuration enabled


[ 
https://issues.apache.org/jira/browse/YARN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509449#comment-14509449
 ] 

Wangda Tan commented on YARN-2740:
--

Thanks [~Naganarasimha], will commit once Jenkins get back.

 ResourceManager side should properly handle node label modifications when 
 distributed node label configuration enabled
 --

 Key: YARN-2740
 URL: https://issues.apache.org/jira/browse/YARN-2740
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Fix For: 2.8.0

 Attachments: YARN-2740-20141024-1.patch, YARN-2740.20150320-1.patch, 
 YARN-2740.20150327-1.patch, YARN-2740.20150411-1.patch, 
 YARN-2740.20150411-2.patch, YARN-2740.20150411-3.patch, 
 YARN-2740.20150417-1.patch, YARN-2740.20150420-1.patch, 
 YARN-2740.20150421-1.patch, YARN-2740.20150422-2.patch, 
 YARN-2740.20150423-1.patch


 According to YARN-2495, when distributed node label configuration is enabled:
 - RMAdmin / REST API should reject change labels on node operations.
 - CommonNodeLabelsManager shouldn't persist labels on nodes when NM do 
 heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-3190) NM can't aggregate logs: token can't be found in cache


 [ 
https://issues.apache.org/jira/browse/YARN-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu resolved YARN-3190.
-
Resolution: Duplicate

issue is fixed by YARN-2964 

 NM can't aggregate logs: token  can't be found in cache
 ---

 Key: YARN-3190
 URL: https://issues.apache.org/jira/browse/YARN-3190
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
 Environment: CDH 5.3.1
 HA HDFS
 Kerberos
Reporter: Andrejs Dubovskis
Priority: Minor

 In rare cases node manager can not aggregate logs: generating exception:
 {code}
 2015-02-12 13:04:03,703 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
  Starting aggregate log-file for app application_1423661043235_2150 at 
 /tmp/logs/catalyst/logs/application_1423661043235_2150/catdn001.intrum.net_8041.tmp
 2015-02-12 13:04:03,707 INFO 
 org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Deleting 
 absolute path : 
 /data5/yarn/nm/usercache/catalyst/appcache/application_1423661043235_2150/container_1423661043235_2150_01_000442
 2015-02-12 13:04:03,707 INFO 
 org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Deleting 
 absolute path : 
 /data6/yarn/nm/usercache/catalyst/appcache/application_1423661043235_2150/container_1423661043235_2150_01_000442
 2015-02-12 13:04:03,707 INFO 
 org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Deleting 
 absolute path : 
 /data7/yarn/nm/usercache/catalyst/appcache/application_1423661043235_2150/container_1423661043235_2150_01_000442
 2015-02-12 13:04:03,709 INFO 
 org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Deleting 
 absolute path : 
 /data1/yarn/nm/usercache/catalyst/appcache/application_1423661043235_2150
 2015-02-12 13:04:03,709 WARN org.apache.hadoop.security.UserGroupInformation: 
 PriviledgedActionException as:catalyst (auth:SIMPLE) 
 cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
  token (HDFS_DELEGATION_TOKEN token 2334644 for catalyst) can't be found in 
 cache
 2015-02-12 13:04:03,709 WARN org.apache.hadoop.ipc.Client: Exception 
 encountered while connecting to the server : 
 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
  token (HDFS_DELEGATION_TOKEN token 2334644 for catalyst) can't be found in 
 cache
 2015-02-12 13:04:03,709 WARN org.apache.hadoop.security.UserGroupInformation: 
 PriviledgedActionException as:catalyst (auth:SIMPLE) 
 cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
  token (HDFS_DELEGATION_TOKEN token 2334644 for catalyst) can't be found in 
 cache
 2015-02-12 13:04:03,712 WARN org.apache.hadoop.security.UserGroupInformation: 
 PriviledgedActionException as:catalyst (auth:SIMPLE) 
 cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
  token (HDFS_DELEGATION_TOKEN token 2334644 for catalyst) can't be found in 
 cache
 2015-02-12 13:04:03,712 ERROR 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
  Cannot create writer for app application_1423661043235_2150. Disabling 
 log-aggregation for this app.
 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
  token (HDFS_DELEGATION_TOKEN token 2334644 for catalyst) can't be found in 
 cache
 at org.apache.hadoop.ipc.Client.call(Client.java:1411)
 at org.apache.hadoop.ipc.Client.call(Client.java:1364)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
 at com.sun.proxy.$Proxy19.getServerDefaults(Unknown Source)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getServerDefaults(ClientNamenodeProtocolTranslatorPB.java:259)
 at sun.reflect.GeneratedMethodAccessor114.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
 at com.sun.proxy.$Proxy20.getServerDefaults(Unknown Source)
 at 
 org.apache.hadoop.hdfs.DFSClient.getServerDefaults(DFSClient.java:966)
 at org.apache.hadoop.fs.Hdfs.getServerDefaults(Hdfs.java:159)
 at 
 org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:543)
 at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:680)
 at

[jira] [Commented] (YARN-3522) DistributedShell uses the wrong user to put timeline data

2015-04-23 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509464#comment-14509464
 ] 

Jian He commented on YARN-3522:
---

lgtm,  thanks zhijie !

 DistributedShell uses the wrong user to put timeline data
 -

 Key: YARN-3522
 URL: https://issues.apache.org/jira/browse/YARN-3522
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Attachments: YARN-3522.1.patch, YARN-3522.2.patch, YARN-3522.3.patch


 YARN-3287 breaks the timeline access control of distributed shell. In 
 distributed shell AM:
 {code}
 if (conf.getBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED,
   YarnConfiguration.DEFAULT_TIMELINE_SERVICE_ENABLED)) {
   // Creating the Timeline Client
   timelineClient = TimelineClient.createTimelineClient();
   timelineClient.init(conf);
   timelineClient.start();
 } else {
   timelineClient = null;
   LOG.warn(Timeline service is not enabled);
 }
 {code}
 {code}
   ugi.doAs(new PrivilegedExceptionActionTimelinePutResponse() {
 @Override
 public TimelinePutResponse run() throws Exception {
   return timelineClient.putEntities(entity);
 }
   });
 {code}
 YARN-3287 changes the timeline client to get the right ugi at serviceInit, 
 but DS AM still doesn't use submitter ugi to init timeline client, but use 
 the ugi for each put entity call. It result in the wrong user of the put 
 request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3522) DistributedShell uses the wrong user to put timeline data


[ 
https://issues.apache.org/jira/browse/YARN-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509508#comment-14509508
 ] 

Hudson commented on YARN-3522:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7649 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7649/])
YARN-3522. Fixed DistributedShell to instantiate TimeLineClient as the correct 
user. Contributed by Zhijie Shen (jianhe: rev 
aa4a192feb8939353254d058c5f81bddbd0335c0)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/TimelineClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDSAppMaster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDSFailedAppMaster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* hadoop-yarn-project/CHANGES.txt


 DistributedShell uses the wrong user to put timeline data
 -

 Key: YARN-3522
 URL: https://issues.apache.org/jira/browse/YARN-3522
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Fix For: 2.7.1

 Attachments: YARN-3522.1.patch, YARN-3522.2.patch, YARN-3522.3.patch


 YARN-3287 breaks the timeline access control of distributed shell. In 
 distributed shell AM:
 {code}
 if (conf.getBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED,
   YarnConfiguration.DEFAULT_TIMELINE_SERVICE_ENABLED)) {
   // Creating the Timeline Client
   timelineClient = TimelineClient.createTimelineClient();
   timelineClient.init(conf);
   timelineClient.start();
 } else {
   timelineClient = null;
   LOG.warn(Timeline service is not enabled);
 }
 {code}
 {code}
   ugi.doAs(new PrivilegedExceptionActionTimelinePutResponse() {
 @Override
 public TimelinePutResponse run() throws Exception {
   return timelineClient.putEntities(entity);
 }
   });
 {code}
 YARN-3287 changes the timeline client to get the right ugi at serviceInit, 
 but DS AM still doesn't use submitter ugi to init timeline client, but use 
 the ugi for each put entity call. It result in the wrong user of the put 
 request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins


[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509230#comment-14509230
 ] 

Brahma Reddy Battula commented on YARN-3528:


Planning to give 0 for all HTTP and RPC ports...

 Tests with 12345 as hard-coded port break jenkins
 -

 Key: YARN-3528
 URL: https://issues.apache.org/jira/browse/YARN-3528
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
 Environment: ASF Jenkins
Reporter: Steve Loughran
Assignee: Brahma Reddy Battula
Priority: Blocker

 A lot of the YARN tests have hard-coded the port 12345 for their services to 
 come up on.
 This makes it impossible to have scheduled or precommit tests to run 
 consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
 appear to get ignored completely.
 A quick grep of 12345 shows up many places in the test suite where this 
 practise has developed.
 * All {{BaseContainerManagerTest}} subclasses
 * {{TestNodeManagerShutdown}}
 * {{TestContainerManager}}
 + others
 This needs to be addressed through portscanning and dynamic port allocation. 
 Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3480) Make AM max attempts stored in RMAppImpl and RMStateStore to be configurable


[ 
https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509263#comment-14509263
 ] 

Hadoop QA commented on YARN-3480:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12727636/YARN-3480.01.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 189a63a |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7473/console |


This message was automatically generated.

 Make AM max attempts stored in RMAppImpl and RMStateStore to be configurable
 

 Key: YARN-3480
 URL: https://issues.apache.org/jira/browse/YARN-3480
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-3480.01.patch


 When RM HA is enabled and running containers are kept across attempts, apps 
 are more likely to finish successfully with more retries(attempts), so it 
 will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However 
 it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make 
 RM recover process much slower. It might be better to set max attempts to be 
 stored in RMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED


[ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509276#comment-14509276
 ] 

Jason Lowe commented on YARN-3535:
--

The first item is to avoid containers failing due to an NM restart.  As it is 
now, a container handed out by the RM to an idle NM can fail if the NM restarts 
before the AM launches the container.

  ResourceRequest should be restored back to scheduler when RMContainer is 
 killed at ALLOCATED
 -

 Key: YARN-3535
 URL: https://issues.apache.org/jira/browse/YARN-3535
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Peng Zhang
Assignee: Peng Zhang
 Attachments: syslog.tgz, yarn-app.log


 During rolling update of NM, AM start of container on NM failed. 
 And then job hang there.
 Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3539) Compatibility doc to state that ATS v1 is a stable REST API


 [ 
https://issues.apache.org/jira/browse/YARN-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-3539:
-
Attachment: YARN-3539-003.patch

Patch -003 updates {{timelineserver.md}}

# specify the REST API (non-normative)
# add some more on futures
# review configuration options
# fix up broken internal links by adding the anchors
# yarn/index.html includes link to ATS Rest API as one of the listed REST APIs

 Compatibility doc to state that ATS v1 is a stable REST API
 ---

 Key: YARN-3539
 URL: https://issues.apache.org/jira/browse/YARN-3539
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.7.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: HADOOP-11826-001.patch, HADOOP-11826-002.patch, 
 YARN-3539-003.patch


 The ATS v2 discussion and YARN-2423 have raised the question: how stable are 
 the ATSv1 APIs?
 The existing compatibility document actually states that the History Server 
 is [a stable REST 
 API|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#REST_APIs],
  which effectively means that ATSv1 has already been declared as a stable API.
 Clarify this by patching the compatibility document appropriately



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3537) NPE when NodeManager.serviceInit fails and stopRecoveryStore invoked


 [ 
https://issues.apache.org/jira/browse/YARN-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3537:
---
Attachment: YARN-3537-003.patch

 NPE when NodeManager.serviceInit fails and stopRecoveryStore invoked
 

 Key: YARN-3537
 URL: https://issues.apache.org/jira/browse/YARN-3537
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
 Attachments: YARN-3537-002.patch, YARN-3537-003.patch, YARN-3537.patch


 2015-04-23 19:30:34,961 INFO  [main] service.AbstractService 
 (AbstractService.java:noteFailure(272)) - Service NodeManager failed in state 
 STOPPED; cause: java.lang.NullPointerException
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.stopRecoveryStore(NodeManager.java:181)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:326)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.tearDown(TestNodeManagerShutdown.java:106)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3431) Sub resources of timeline entity needs to be passed to a separate endpoint.

2015-04-23 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509774#comment-14509774
 ] 

Li Lu commented on YARN-3431:
-

Hi [~zjshen], thanks for the update! The latest patch LGTM. 

 Sub resources of timeline entity needs to be passed to a separate endpoint.
 ---

 Key: YARN-3431
 URL: https://issues.apache.org/jira/browse/YARN-3431
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3431.1.patch, YARN-3431.2.patch, YARN-3431.3.patch, 
 YARN-3431.4.patch, YARN-3431.5.patch, YARN-3431.6.patch


 We have TimelineEntity and some other entities as subclass that inherit from 
 it. However, we only have a single endpoint, which consume TimelineEntity 
 rather than sub-classes and this endpoint will check the incoming request 
 body contains exactly TimelineEntity object. However, the json data which is 
 serialized from sub-class object seems not to be treated as an TimelineEntity 
 object, and won't be deserialized into the corresponding sub-class object 
 which cause deserialization failure as some discussions in YARN-3334 : 
 https://issues.apache.org/jira/browse/YARN-3334?focusedCommentId=14391059page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14391059.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3319) Implement a FairOrderingPolicy

2015-04-23 Thread Craig Welch (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509641#comment-14509641
 ] 

Craig Welch commented on YARN-3319:
---

Yes, it's configured in the capacity scheduler configuration with something 
like this:
{code}
property

name(yarn-queue-prefix).ordering-policy.fair.enable-size-based-weight/name
valuetrue/value
  /property
{code}


 Implement a FairOrderingPolicy
 --

 Key: YARN-3319
 URL: https://issues.apache.org/jira/browse/YARN-3319
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Fix For: 2.8.0

 Attachments: YARN-3319.13.patch, YARN-3319.14.patch, 
 YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch, 
 YARN-3319.45.patch, YARN-3319.47.patch, YARN-3319.53.patch, 
 YARN-3319.58.patch, YARN-3319.70.patch, YARN-3319.71.patch, 
 YARN-3319.72.patch, YARN-3319.73.patch, YARN-3319.74.patch, YARN-3319.75.patch


 Implement a FairOrderingPolicy which prefers to allocate to 
 SchedulerProcesses with least current usage, very similar to the 
 FairScheduler's FairSharePolicy.  
 The Policy will offer allocations to applications in a queue in order of 
 least resources used, and preempt applications in reverse order (from most 
 resources used). This will include conditional support for sizeBasedWeight 
 style adjustment
 Optionally, based on a conditional configuration to enable sizeBasedWeight 
 (default false), an adjustment to boost larger applications (to offset the 
 natural preference for smaller applications) will adjust the resource usage 
 value based on demand, dividing it by the below value:
 Math.log1p(app memory demand) / Math.log(2);
 In cases where the above is indeterminate (two applications are equal after 
 this comparison), behavior falls back to comparison based on the application 
 id, which is generally lexically FIFO for that comparison



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-2032) Implement a scalable, available TimelineStore using HBase


 [ 
https://issues.apache.org/jira/browse/YARN-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-2032.
---
Resolution: Won't Fix

It will be covered in YARN-2928

 Implement a scalable, available TimelineStore using HBase
 -

 Key: YARN-2032
 URL: https://issues.apache.org/jira/browse/YARN-2032
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Li Lu
 Attachments: YARN-2032-091114.patch, YARN-2032-branch-2-1.patch, 
 YARN-2032-branch2-2.patch


 As discussed on YARN-1530, we should pursue implementing a scalable, 
 available Timeline store using HBase.
 One goal is to reuse most of the code from the levelDB Based store - 
 YARN-1635.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3540) Fetcher#copyMapOutput is leaking usedMemory upon IOException during InMemoryMapOutput shuffle handler

2015-04-23 Thread Eric Payne (JIRA)

Eric Payne created YARN-3540:


 Summary: Fetcher#copyMapOutput is leaking usedMemory upon 
IOException during InMemoryMapOutput shuffle handler
 Key: YARN-3540
 URL: https://issues.apache.org/jira/browse/YARN-3540
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Eric Payne
Assignee: Eric Payne
Priority: Blocker


We are seeing this happen when
- an NM's disk goes bad during the creation of map output(s)
- the reducer's fetcher can read the shuffle header and reserve the memory
- but gets an IOException when trying to shuffle for InMemoryMapOutput
- shuffle fetch retry is enabled




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only

2015-04-23 Thread Varun Vasudev (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-3517:

Attachment: YARN-3517.005.patch

Uploaded a new patch to fix the whitespace and checkstyle errors. The test 
failure is unrelated to the patch.

 RM web ui for dumping scheduler logs should be for admins only
 --

 Key: YARN-3517
 URL: https://issues.apache.org/jira/browse/YARN-3517
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, security
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Blocker
  Labels: security
 Attachments: YARN-3517.001.patch, YARN-3517.002.patch, 
 YARN-3517.003.patch, YARN-3517.004.patch, YARN-3517.005.patch


 YARN-3294 allows users to dump scheduler logs from the web UI. This should be 
 for admins only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3363) add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container.


[ 
https://issues.apache.org/jira/browse/YARN-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509738#comment-14509738
 ] 

Hadoop QA commented on YARN-3363:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 39s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 30s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 32s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   5m 26s | The applied patch generated  3 
 additional checkstyle issues. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m  3s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   6m  0s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  46m 41s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12727688/YARN-3363.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 416b843 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/7480/artifact/patchprocess/checkstyle-result-diff.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7480/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7480/testReport/ |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7480/console |


This message was automatically generated.

 add localization and container launch time to ContainerMetrics at NM to show 
 these timing information for each active container.
 

 Key: YARN-3363
 URL: https://issues.apache.org/jira/browse/YARN-3363
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: zhihai xu
Assignee: zhihai xu
  Labels: metrics, supportability
 Attachments: YARN-3363.000.patch, YARN-3363.001.patch


 add localization and container launch time to ContainerMetrics at NM to show 
 these timing information for each active container.
 Currently ContainerMetrics has container's actual memory usage(YARN-2984),  
 actual CPU usage(YARN-3122), resource  and pid(YARN-3022). It will be better 
 to have localization and container launch time in ContainerMetrics for each 
 active container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2740) ResourceManager side should properly handle node label modifications when distributed node label configuration enabled


[ 
https://issues.apache.org/jira/browse/YARN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509648#comment-14509648
 ] 

Hadoop QA commented on YARN-2740:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 44s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 7 new or modified test files. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 34s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 35s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   5m 25s | The applied patch generated  3 
 additional checkstyle issues. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m  2s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   0m 25s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   2m  2s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |  52m 24s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  98m 43s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12727647/YARN-2740.20150423-1.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 3952054 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/7479/artifact/patchprocess/checkstyle-result-diff.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7479/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7479/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7479/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7479/testReport/ |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7479/console |


This message was automatically generated.

 ResourceManager side should properly handle node label modifications when 
 distributed node label configuration enabled
 --

 Key: YARN-2740
 URL: https://issues.apache.org/jira/browse/YARN-2740
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Fix For: 2.8.0

 Attachments: YARN-2740-20141024-1.patch, YARN-2740.20150320-1.patch, 
 YARN-2740.20150327-1.patch, YARN-2740.20150411-1.patch, 
 YARN-2740.20150411-2.patch, YARN-2740.20150411-3.patch, 
 YARN-2740.20150417-1.patch, YARN-2740.20150420-1.patch, 
 YARN-2740.20150421-1.patch, YARN-2740.20150422-2.patch, 
 YARN-2740.20150423-1.patch


 According to YARN-2495, when distributed node label configuration is enabled:
 - RMAdmin / REST API should reject change labels on node operations.
 - CommonNodeLabelsManager shouldn't persist labels on nodes when NM do 
 heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream


[ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509722#comment-14509722
 ] 

zhihai xu commented on YARN-2893:
-

Hi [~jianhe], Do you think any of my earlier suggestions are reasonable?

 AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
 --

 Key: YARN-2893
 URL: https://issues.apache.org/jira/browse/YARN-2893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Gera Shegalov
Assignee: zhihai xu
 Attachments: YARN-2893.000.patch, YARN-2893.001.patch, 
 YARN-2893.002.patch


 MapReduce jobs on our clusters experience sporadic failures due to corrupt 
 tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3437) convert load test driver to timeline service v.2

[
https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509717#comment-14509717
]

Zhijie Shen commented on YARN-3437:
---

Sorry for the late comments. This patch has half MR code and half YARN code.
It's not good to commit it as one patch. I have one thought of managing the
commits:

1. The YARN code is nearly duplicate with YARN-3390. As YARN-3390 is almost
ready, we can get that patch in first.

2. Move this jira to MR project, only retain the MR code in the patch and do
some minor rebase according to YARN-3390.

3. TimelineServicePerformanceTest is in different package and has the different
name. Hopefully it won't conflict with YARN-2556. So once YARN-2556 gets
committed, we just need to refactor TimelineServicePerformanceTest to reuse
YARN-2556 code. BTW, can we put TimelineServicePerformanceTest into the same
package of TimelineServicePerformance in YARN-2556, and rename it to
TimelineServicePerformanceTestv2?

How do you think about the plan for the commits?

W.R.T to the patch, I'm a bit concerned that the write which contains one event
per entity is not so typical to represent real use case. And configuration and
metrics are even not covered. Is it more realistic to write an entity with 10
events and 10 metrics, which have 100 points in the time series?

And one nit in the patch: {{entity.setEntityType(TEZ_DAG_ID);}}. How about
not mentioning TEZ in the MR code?

convert load test driver to timeline service v.2

Key: YARN-3437
URL: https://issues.apache.org/jira/browse/YARN-3437
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Attachments: YARN-3437.001.patch, YARN-3437.002.patch,
YARN-3437.003.patch, YARN-3437.004.patch

This subtask covers the work for converting the proposed patch for the load
test driver (YARN-2556) to work with the timeline service v.2.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3319) Implement a FairOrderingPolicy


[ 
https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509630#comment-14509630
 ] 

Vinod Kumar Vavilapalli commented on YARN-3319:
---

Is FairOrderingPolicy.ENABLE_SIZE_BASED_WEIGHT supposed to be admin visible? If 
so, we need a better, fully qualified name..

 Implement a FairOrderingPolicy
 --

 Key: YARN-3319
 URL: https://issues.apache.org/jira/browse/YARN-3319
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Fix For: 2.8.0

 Attachments: YARN-3319.13.patch, YARN-3319.14.patch, 
 YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch, 
 YARN-3319.45.patch, YARN-3319.47.patch, YARN-3319.53.patch, 
 YARN-3319.58.patch, YARN-3319.70.patch, YARN-3319.71.patch, 
 YARN-3319.72.patch, YARN-3319.73.patch, YARN-3319.74.patch, YARN-3319.75.patch


 Implement a FairOrderingPolicy which prefers to allocate to 
 SchedulerProcesses with least current usage, very similar to the 
 FairScheduler's FairSharePolicy.  
 The Policy will offer allocations to applications in a queue in order of 
 least resources used, and preempt applications in reverse order (from most 
 resources used). This will include conditional support for sizeBasedWeight 
 style adjustment
 Optionally, based on a conditional configuration to enable sizeBasedWeight 
 (default false), an adjustment to boost larger applications (to offset the 
 natural preference for smaller applications) will adjust the resource usage 
 value based on demand, dividing it by the below value:
 Math.log1p(app memory demand) / Math.log(2);
 In cases where the above is indeterminate (two applications are equal after 
 this comparison), behavior falls back to comparison based on the application 
 id, which is generally lexically FIFO for that comparison



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only


[ 
https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509591#comment-14509591
 ] 

Hadoop QA commented on YARN-3517:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 38s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 1  line(s) that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 30s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 38s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   5m 24s | The applied patch generated  2 
 additional checkstyle issues. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 39s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   2m  2s | Tests passed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |  53m 59s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  98m 18s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12727652/YARN-3517.004.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 49f6e3d |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/7478/artifact/patchprocess/whitespace.txt
 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/7478/artifact/patchprocess/checkstyle-result-diff.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7478/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7478/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7478/testReport/ |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7478/console |


This message was automatically generated.

 RM web ui for dumping scheduler logs should be for admins only
 --

 Key: YARN-3517
 URL: https://issues.apache.org/jira/browse/YARN-3517
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, security
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Blocker
  Labels: security
 Attachments: YARN-3517.001.patch, YARN-3517.002.patch, 
 YARN-3517.003.patch, YARN-3517.004.patch


 YARN-3294 allows users to dump scheduler logs from the web UI. This should be 
 for admins only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests


[ 
https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509689#comment-14509689
 ] 

Zhijie Shen commented on YARN-3529:
---

Previously, we met the compatibility issue of having dependency on HBase 0.9x. 
I'm not sure if HBase has resolve it, or we have the way to work around it. 
Please take a look at HADOOP-10995 and YARN-2032.

 Add miniHBase cluster and Phoenix support to ATS v2 unit tests
 --

 Key: YARN-3529
 URL: https://issues.apache.org/jira/browse/YARN-3529
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
 Attachments: AbstractMiniHBaseClusterTest.java, 
 output_minicluster2.txt


 After we have our HBase and Phoenix writer implementations, we may want to 
 find a way to set up HBase and Phoenix in our unit tests. We need to do this 
 integration before the branch got merged back to trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3437) convert load test driver to timeline service v.2


[ 
https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509905#comment-14509905
 ] 

Sangjin Lee commented on YARN-3437:
---

Well it's not entirely true. It seems I still need to change 
TimelineCollector.getTimelineEntityContext() from protected to public. But 
creating another YARN JIRA just to make those several lines of changes seems 
too much. Thoughts folks?

 convert load test driver to timeline service v.2
 

 Key: YARN-3437
 URL: https://issues.apache.org/jira/browse/YARN-3437
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: YARN-3437.001.patch, YARN-3437.002.patch, 
 YARN-3437.003.patch, YARN-3437.004.patch


 This subtask covers the work for converting the proposed patch for the load 
 test driver (YARN-2556) to work with the timeline service v.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3437) convert load test driver to timeline service v.2


[ 
https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509932#comment-14509932
 ] 

Zhijie Shen commented on YARN-3437:
---

bq.  How's that sound?

It's also good to me.

bq. We would use that one for more realistic load whereas we could keep this 
mode as a simpler test. Thoughts?

It's okay to make it a simpler case, but could we at least cover one config, 
and one metric, hence we can verify the db that storing this info also works?

bq.  But creating another YARN JIRA just to make those several lines of changes 
seems too much

A couple of lines change in YARN for MR patch is okay.

 convert load test driver to timeline service v.2
 

 Key: YARN-3437
 URL: https://issues.apache.org/jira/browse/YARN-3437
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: YARN-3437.001.patch, YARN-3437.002.patch, 
 YARN-3437.003.patch, YARN-3437.004.patch


 This subtask covers the work for converting the proposed patch for the load 
 test driver (YARN-2556) to work with the timeline service v.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3437) convert load test driver to timeline service v.2


 [ 
https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3437:
--
Issue Type: New Feature  (was: Sub-task)
Parent: (was: YARN-3378)

 convert load test driver to timeline service v.2
 

 Key: YARN-3437
 URL: https://issues.apache.org/jira/browse/YARN-3437
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: YARN-3437.001.patch, YARN-3437.002.patch, 
 YARN-3437.003.patch, YARN-3437.004.patch


 This subtask covers the work for converting the proposed patch for the load 
 test driver (YARN-2556) to work with the timeline service v.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3390) Reuse TimelineCollectorManager for RM


 [ 
https://issues.apache.org/jira/browse/YARN-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3390:
--
Attachment: YARN-3390.2.patch

 Reuse TimelineCollectorManager for RM
 -

 Key: YARN-3390
 URL: https://issues.apache.org/jira/browse/YARN-3390
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3390.1.patch, YARN-3390.2.patch


 RMTimelineCollector should have the context info of each app whose entity  
 has been put



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-23 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509961#comment-14509961
 ] 

Li Lu commented on YARN-3411:
-

Oh, and one thing to add, in the added pom file, maybe we can centralize the 
version of hbase (the Phoenix patch also has this problem)? This may make 
version management slightly easier. Maybe we can address this problem together 
with the Phoenix one in YARN-3529? 

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411.poc.2.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3390) Reuse TimelineCollectorManager for RM

[
https://issues.apache.org/jira/browse/YARN-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509962#comment-14509962
]

Zhijie Shen commented on YARN-3390:
---

Thanks for the comments. I've addressed Sangjin and Li's comments except:

bq. maybe we'd like to mark it as unstable?

It's not the API for the users, hence it's okay to leave it unannotated.

bq. In TimelineCollectorWebService, why we're removing the utility function
getCollector?

After the refactoring, we don't need to convert appId to string. It's not
necessary to wrap a single statement in a method.

In addition, I changed to use hook in TimelineCollectorManager, but postRemove
is called before stopping the collector, because once the collector is stopped,
the hook may not be able to do something with the stopped collector.

Moreover, I moved RMApp.stopTimelineCollector into FinalTransition. Suppose the
collector only collects application lifecycle events, it doesn't need to stay
after the app is finished. We can adjust it later if we find the collector
needs to stay after the app is finished.

Reuse TimelineCollectorManager for RM
-

Key: YARN-3390
URL: https://issues.apache.org/jira/browse/YARN-3390
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Attachments: YARN-3390.1.patch, YARN-3390.2.patch

RMTimelineCollector should have the context info of each app whose entity
has been put

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2498) Respect labels in preemption policy of capacity scheduler for inter-queue preemption

2015-04-23 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509967#comment-14509967
 ] 

Jian He commented on YARN-2498:
---

- remove below 
{code}
NavigableSetFiCaSchedulerApp ns =
(NavigableSetFiCaSchedulerApp) leafQueue.getApplications();
{code}
- this piece is dup with addToPreemptMap method ?
{code}
SetRMContainer toPreemptContainers =
preemptMap.get(fc.getApplicationAttemptId());
if (null == toPreemptContainers) {
  toPreemptContainers = new HashSetRMContainer();
}
preemptMap.put(fc.getApplicationAttemptId(), toPreemptContainers);
{code}
- below code at line 744 is dup with the check at line 650 ?
{code}
if (resToObtainByPartition.isEmpty()) {
  return;
}
{code}
- tryPreemptContainerAndDeductResToObtain can also include the addToPreemptMap 
method so that every caller doesn’t need to invoke that.
- TempQueuePartition - TempQueuePerPartition
-  a few long lines: e.g. tryPreemptContainerAndDeductResToObtain
- remove LeafQueue#getIgnoreExclusivityResourceByPartition
- simplify below a  bit
{code}
private TempQueuePartition getQueueByPartition(String queueName,
String partition) {
  if (!queueToPartitions.containsKey(queueName)) {
return null;
  }
  if (!queueToPartitions.get(queueName).containsKey(partition)) {
return null;
  }
  return queueToPartitions.get(queueName).get(partition);
}
{code}


 Respect labels in preemption policy of capacity scheduler for inter-queue 
 preemption
 

 Key: YARN-2498
 URL: https://issues.apache.org/jira/browse/YARN-2498
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: STALED-YARN-2498.zip, YARN-2498.1.patch, 
 YARN-2498.2.patch


 There're 3 stages in ProportionalCapacityPreemptionPolicy,
 # Recursively calculate {{ideal_assigned}} for queue. This is depends on 
 available resource, resource used/pending in each queue and guaranteed 
 capacity of each queue.
 # Mark to-be preempted containers: For each over-satisfied queue, it will 
 mark some containers will be preempted.
 # Notify scheduler about to-be preempted container.
 We need respect labels in the cluster for both #1 and #2:
 For #1, when calculating ideal_assigned for each queue, we need get 
 by-partition-ideal-assigned according to queue's 
 guaranteed/maximum/used/pending resource on specific partition.
 For #2, when we make decision about whether we need preempt a container, we 
 need make sure, resource this container is *possibly* usable by a queue which 
 is under-satisfied and has pending resource.
 In addition, we need to handle ignore_partition_exclusivity case, when we 
 need to preempt containers from a queue's partition, we will first preempt 
 ignore_partition_exclusivity allocated containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3536) ZK exception occur when updating AppAttempt status, then NPE thrown when RM do recover


[ 
https://issues.apache.org/jira/browse/YARN-3536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509802#comment-14509802
 ] 

zhihai xu commented on YARN-3536:
-

Is This issue similar as YARN-2834? 

 ZK exception occur when updating AppAttempt status, then NPE thrown when RM 
 do recover
 --

 Key: YARN-3536
 URL: https://issues.apache.org/jira/browse/YARN-3536
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, resourcemanager
Affects Versions: 2.4.1
Reporter: gu-chi

 Here is a scenario that Application status is FAILED/FINISHED but AppAttempt 
 status is null, this cause NPE when doing recover with 
 yarn.resourcemanager.work-preserving-recovery.enabled set to true, RM should 
 handle recovery gracefully



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3476) Nodemanager can fail to delete local logs if log aggregation fails


[ 
https://issues.apache.org/jira/browse/YARN-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509841#comment-14509841
 ] 

Jason Lowe commented on YARN-3476:
--

I'm OK with deleting the logs upon error uploading.  It should be a rare 
occurrence, and log availability is already a best-effort rather than 
guaranteed service.  Even if we try to retain the logs it has questionable 
benefit in practice, as the history of a job always points to the aggregated 
logs, not the node's copy of the logs, and thus the logs will still be lost 
from the end-user's point of view.  Savvy users may realize the logs could 
still be on the original node, but most won't know to check there or how to 
form the URL to find them.  If we always point to the node then that defeats 
one of the features of log aggregation, since loss of the node will mean the 
node's URL is bad and we fail to show the logs even if they are aggregated.

So for now I say we keep it simple and just cleanup the files on errors to 
prevent leaks.  Speaking of which I took a look at the patch.  It will fix the 
particular error we saw with TFiles, but there could easily be other 
non-IOExceptions that creep out of the code, especially as it is maintained 
over time.  Would it be better to wrap the cleanup in a finally block or 
something a little more broadly applicable to errors that occur?

 Nodemanager can fail to delete local logs if log aggregation fails
 --

 Key: YARN-3476
 URL: https://issues.apache.org/jira/browse/YARN-3476
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation, nodemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Rohith
 Attachments: 0001-YARN-3476.patch


 If log aggregation encounters an error trying to upload the file then the 
 underlying TFile can throw an illegalstateexception which will bubble up 
 through the top of the thread and prevent the application logs from being 
 deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3509) CollectorNodemanagerProtocol's authorization doesn't work

2015-04-23 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509864#comment-14509864
 ] 

Junping Du commented on YARN-3509:
--

[~zjshen], thanks for updating the patch. Shall we wait security design for v2 
timeline service get finalized then back to your patch? 

 CollectorNodemanagerProtocol's authorization doesn't work
 -

 Key: YARN-3509
 URL: https://issues.apache.org/jira/browse/YARN-3509
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, security, timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3509.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3437) convert load test driver to timeline service v.2


[ 
https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509934#comment-14509934
 ] 

Zhijie Shen commented on YARN-3437:
---

Oh, previously I said TimelineServicePerformanceTestv2, but actually I meant 
TimelineServicePerformanceV2. Just a minor suggestion, and it's up to you to 
find the suitable class name.

 convert load test driver to timeline service v.2
 

 Key: YARN-3437
 URL: https://issues.apache.org/jira/browse/YARN-3437
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: YARN-3437.001.patch, YARN-3437.002.patch, 
 YARN-3437.003.patch, YARN-3437.004.patch


 This subtask covers the work for converting the proposed patch for the load 
 test driver (YARN-2556) to work with the timeline service v.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only


[ 
https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509890#comment-14509890
 ] 

Hadoop QA commented on YARN-3517:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 52s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 34s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 31s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   5m 24s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 39s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   1m 55s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |  52m 23s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  96m 50s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12727691/YARN-3517.005.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 416b843 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7481/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7481/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7481/testReport/ |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7481/console |


This message was automatically generated.

 RM web ui for dumping scheduler logs should be for admins only
 --

 Key: YARN-3517
 URL: https://issues.apache.org/jira/browse/YARN-3517
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, security
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Blocker
  Labels: security
 Attachments: YARN-3517.001.patch, YARN-3517.002.patch, 
 YARN-3517.003.patch, YARN-3517.004.patch, YARN-3517.005.patch


 YARN-3294 allows users to dump scheduler logs from the web UI. This should be 
 for admins only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3437) convert load test driver to timeline service v.2

[
https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509862#comment-14509862
]

Sangjin Lee commented on YARN-3437:
---

Thanks for your comments [~zjshen].

{quote}
1. The YARN code is nearly duplicate with YARN-3390. As YARN-3390 is almost
ready, we can get that patch in first.
2. Move this jira to MR project, only retain the MR code in the patch and do
some minor rebase according to YARN-3390.
{quote}

Let's do this. While I was working on YARN-3438, I'm realizing that for the
performance tests it is probably OK to use the TimelineCollectors directly and
bypass the TimelineCollectorManager altogether. If we do that, then this could
become purely a MR patch. I'll update this patch to remove the use of
TimelineCollectorManager and move this JIRA to MAPREDUCE. How's that sound?

{quote}
3. TimelineServicePerformanceTest is in different package and has the different
name. Hopefully it won't conflict with YARN-2556. So once YARN-2556 gets
committed, we just need to refactor TimelineServicePerformanceTest to reuse
YARN-2556 code. BTW, can we put TimelineServicePerformanceTest into the same
package of TimelineServicePerformance in YARN-2556, and rename it to
TimelineServicePerformanceTestv2?
{quote}
That's fine. I'll move it back to the same package.

{quote}
W.R.T to the patch, I'm a bit concerned that the write which contains one event
per entity is not so typical to represent real use case. And configuration and
metrics are even not covered. Is it more realistic to write an entity with 10
events and 10 metrics, which have 100 points in the time series?
And one nit in the patch: entity.setEntityType(TEZ_DAG_ID);. How about not
mentioning TEZ in the MR code?
{quote}
Note that this is adding simple entity writes. The more realistic part of the
test is coming in YARN-3438 (I'm nearly finished with that), and it will have
multiple levels of entities as well as metrics and configuration. We would use
that one for more realistic load whereas we could keep this mode as a simpler
test. Thoughts?

I'll change the name of the entity to be something else.

convert load test driver to timeline service v.2

This subtask covers the work for converting the proposed patch for the load
test driver (YARN-2556) to work with the timeline service v.2.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-23 Thread Li Lu (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509951#comment-14509951
]

Li Lu commented on YARN-3411:
-

Hi [~vrushalic], thanks for the patch! I'm OK with the major part of this patch
for now. Here, I'm listing some questions that we can have some discussion on.
# About null checks: so far we do not have a fixed standard on if and where we
need to do null checks. I noticed you assumed info, config, event, and other
similar fields are not null. Maybe we'd like to explicitly decide when all
those fields can be null or empty.
# Maybe we'd like to change TimelineWriterUtils to default access modifier? I
think it would be sufficient to make it visible in package?
# One thing I'd like to open a discussion is on deciding the way to store and
process metrics. Currently, in the hbase patch, startTime and endTime are not
used. In the Phoenix patch, I store time series as a flattened, non-queryable
strings. I think this part also requires some hint from the time-based
aggregations.
# Another thing I'd like to discuss here is if and how we'd like to set up a
separate fast path for metric only updates. On the storage layer, I'd
strongly +1 for a separate fast path such that we can only touch the
(frequently updated) metrics table. Any proposals everyone?

[Storage implementation] explore the native HBase write schema for storage
--

Key: YARN-3411
URL: https://issues.apache.org/jira/browse/YARN-3411
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
Attachments: ATSv2BackendHBaseSchemaproposal.pdf,
YARN-3411.poc.2.txt, YARN-3411.poc.txt

There is work that's in progress to implement the storage based on a Phoenix
schema (YARN-3134).
In parallel, we would like to explore an implementation based on a native
HBase schema for the write path. Such a schema does not exclude using
Phoenix, especially for reads and offline queries.
Once we have basic implementations of both options, we could evaluate them in
terms of performance, scalability, usability, etc. and make a call.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3505) Node's Log Aggregation Report with SUCCEED should not cached in RMApps

2015-04-23 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509330#comment-14509330
 ] 

Junping Du commented on YARN-3505:
--

Thanks for uploading a patch to fix this problem, [~xgong]! I am reviewing your 
patch. In the mean time, can you fix the findbugs warning message which seems 
to be related with the code?

 Node's Log Aggregation Report with SUCCEED should not cached in RMApps
 --

 Key: YARN-3505
 URL: https://issues.apache.org/jira/browse/YARN-3505
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 2.8.0
Reporter: Junping Du
Assignee: Xuan Gong
Priority: Critical
 Attachments: YARN-3505.1.patch


 Per discussions in YARN-1402, we shouldn't cache all node's log aggregation 
 reports in RMApps for always, especially for those finished with SUCCEED.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3154) Should not upload partial logs for MR jobs or other short-running' applications


 [ 
https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-3154:
-
Release Note: Applications which made use of the LogAggregationContext in 
their application will need to revisit this code in order to make sure that 
their logs continue to get rolled out.
Hadoop Flags: Incompatible change,Reviewed  (was: Reviewed)

 Should not upload partial logs for MR jobs or other short-running' 
 applications 
 -

 Key: YARN-3154
 URL: https://issues.apache.org/jira/browse/YARN-3154
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3154.1.patch, YARN-3154.2.patch, YARN-3154.3.patch, 
 YARN-3154.4.patch


 Currently, if we are running a MR job, and we do not set the log interval 
 properly, we will have their partial logs uploaded and then removed from the 
 local filesystem which is not right.
 We only upload the partial logs for LRS applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3363) add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container.


 [ 
https://issues.apache.org/jira/browse/YARN-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3363:

Attachment: YARN-3363.001.patch

 add localization and container launch time to ContainerMetrics at NM to show 
 these timing information for each active container.
 

 Key: YARN-3363
 URL: https://issues.apache.org/jira/browse/YARN-3363
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: zhihai xu
Assignee: zhihai xu
  Labels: metrics, supportability
 Attachments: YARN-3363.000.patch, YARN-3363.001.patch


 add localization and container launch time to ContainerMetrics at NM to show 
 these timing information for each active container.
 Currently ContainerMetrics has container's actual memory usage(YARN-2984),  
 actual CPU usage(YARN-3122), resource  and pid(YARN-3022). It will be better 
 to have localization and container launch time in ContainerMetrics for each 
 active container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3363) add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container.


[ 
https://issues.apache.org/jira/browse/YARN-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509674#comment-14509674
 ] 

zhihai xu commented on YARN-3363:
-

Hi [~adhoot], thanks for the thorough review, Your suggestions are reasonable. 
I uploaded a new patch YARN-3363.001.patch, which addressed all your comments. 
Please review it. thanks

 add localization and container launch time to ContainerMetrics at NM to show 
 these timing information for each active container.
 

 Key: YARN-3363
 URL: https://issues.apache.org/jira/browse/YARN-3363
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: zhihai xu
Assignee: zhihai xu
  Labels: metrics, supportability
 Attachments: YARN-3363.000.patch, YARN-3363.001.patch


 add localization and container launch time to ContainerMetrics at NM to show 
 these timing information for each active container.
 Currently ContainerMetrics has container's actual memory usage(YARN-2984),  
 actual CPU usage(YARN-3122), resource  and pid(YARN-3022). It will be better 
 to have localization and container launch time in ContainerMetrics for each 
 active container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3516) killing ContainerLocalizer action doesn't take effect when private localizer receives FETCH_FAILURE status.

2015-04-23 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510058#comment-14510058
 ] 

Xuan Gong commented on YARN-3516:
-

+1 LGTM. Will commit

 killing ContainerLocalizer action doesn't take effect when private localizer 
 receives FETCH_FAILURE status.
 ---

 Key: YARN-3516
 URL: https://issues.apache.org/jira/browse/YARN-3516
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor
 Attachments: YARN-3516.000.patch


 killing ContainerLocalizer action doesn't take effect when private localizer 
 receives FETCH_FAILURE status. This is a typo from YARN-3024. With YARN-3024, 
 ContainerLocalizer will be killed only if {{action}} is set to 
 {{LocalizerAction.DIE}}, calling {{response.setLocalizerAction}} will be 
 overwritten. This is also a regression from old code.
 Also it make sense to kill the ContainerLocalizer when FETCH_FAILURE 
 happened, because the container will send CLEANUP_CONTAINER_RESOURCES event 
 after localization failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2498) Respect labels in preemption policy of capacity scheduler for inter-queue preemption


 [ 
https://issues.apache.org/jira/browse/YARN-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2498:
-
Attachment: YARN-2498.4.patch

Attached ver.4, updated test to make sure ignore_partition_exclusivity 
containers will be added/removed to/from queue's map

 Respect labels in preemption policy of capacity scheduler for inter-queue 
 preemption
 

 Key: YARN-2498
 URL: https://issues.apache.org/jira/browse/YARN-2498
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: STALED-YARN-2498.zip, YARN-2498.1.patch, 
 YARN-2498.2.patch, YARN-2498.3.patch, YARN-2498.4.patch


 There're 3 stages in ProportionalCapacityPreemptionPolicy,
 # Recursively calculate {{ideal_assigned}} for queue. This is depends on 
 available resource, resource used/pending in each queue and guaranteed 
 capacity of each queue.
 # Mark to-be preempted containers: For each over-satisfied queue, it will 
 mark some containers will be preempted.
 # Notify scheduler about to-be preempted container.
 We need respect labels in the cluster for both #1 and #2:
 For #1, when calculating ideal_assigned for each queue, we need get 
 by-partition-ideal-assigned according to queue's 
 guaranteed/maximum/used/pending resource on specific partition.
 For #2, when we make decision about whether we need preempt a container, we 
 need make sure, resource this container is *possibly* usable by a queue which 
 is under-satisfied and has pending resource.
 In addition, we need to handle ignore_partition_exclusivity case, when we 
 need to preempt containers from a queue's partition, we will first preempt 
 ignore_partition_exclusivity allocated containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3516) killing ContainerLocalizer action doesn't take effect when private localizer receives FETCH_FAILURE status.

2015-04-23 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510080#comment-14510080
 ] 

Xuan Gong commented on YARN-3516:
-

Committed into trunk/branch-2. Thanks, zhihai !

 killing ContainerLocalizer action doesn't take effect when private localizer 
 receives FETCH_FAILURE status.
 ---

 Key: YARN-3516
 URL: https://issues.apache.org/jira/browse/YARN-3516
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3516.000.patch


 killing ContainerLocalizer action doesn't take effect when private localizer 
 receives FETCH_FAILURE status. This is a typo from YARN-3024. With YARN-3024, 
 ContainerLocalizer will be killed only if {{action}} is set to 
 {{LocalizerAction.DIE}}, calling {{response.setLocalizerAction}} will be 
 overwritten. This is also a regression from old code.
 Also it make sense to kill the ContainerLocalizer when FETCH_FAILURE 
 happened, because the container will send CLEANUP_CONTAINER_RESOURCES event 
 after localization failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3536) ZK exception occur when updating AppAttempt status, then NPE thrown when RM do recover

2015-04-23 Thread gu-chi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510320#comment-14510320
 ] 

gu-chi commented on YARN-3536:
--

Thx, as the exception trace stack is almost, I once looked into this ticket. 
This patch is already merged into the current environment I use.
Not same cause.

 ZK exception occur when updating AppAttempt status, then NPE thrown when RM 
 do recover
 --

 Key: YARN-3536
 URL: https://issues.apache.org/jira/browse/YARN-3536
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, resourcemanager
Affects Versions: 2.4.1
Reporter: gu-chi

 Here is a scenario that Application status is FAILED/FINISHED but AppAttempt 
 status is null, this cause NPE when doing recover with 
 yarn.resourcemanager.work-preserving-recovery.enabled set to true, RM should 
 handle recovery gracefully



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3492) AM fails to come up because RM and NM can't connect to each other


[ 
https://issues.apache.org/jira/browse/YARN-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510099#comment-14510099
 ] 

Vinod Kumar Vavilapalli commented on YARN-3492:
---

[~kasha], can you try this on a different box or something to see if this is an 
env issue? Tx.

 AM fails to come up because RM and NM can't connect to each other
 -

 Key: YARN-3492
 URL: https://issues.apache.org/jira/browse/YARN-3492
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
 Environment: pseudo-distributed cluster on a mac
Reporter: Karthik Kambatla
Priority: Blocker
 Attachments: mapred-site.xml, 
 yarn-kasha-nodemanager-kasha-mbp.local.log, 
 yarn-kasha-resourcemanager-kasha-mbp.local.log, yarn-site.xml


 Stood up a pseudo-distributed cluster with 2.7.0 RC0. Submitted a pi job. The 
 container gets allocated, but doesn't get launched. The NM can't talk to the 
 RM. Logs to follow. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3534) Report node resource utilization

2015-04-23 Thread Inigo Goiri (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated YARN-3534:
--
Attachment: YARN-3534-1.patch

No unit tests yet.

 Report node resource utilization
 

 Key: YARN-3534
 URL: https://issues.apache.org/jira/browse/YARN-3534
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Affects Versions: 2.7.0
Reporter: Inigo Goiri
Assignee: Inigo Goiri
 Attachments: YARN-3534-1.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 YARN should be aware of the resource utilization of the nodes when scheduling 
 containers. For this, this task will implement the NodeResourceMonitor and 
 send this information to the Resource Manager in the heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3534) Report node resource utilization

2015-04-23 Thread Inigo Goiri (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated YARN-3534:
--
Attachment: YARN-3534-2.patch

Merged to trunk.

 Report node resource utilization
 

 Key: YARN-3534
 URL: https://issues.apache.org/jira/browse/YARN-3534
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Affects Versions: 2.7.0
Reporter: Inigo Goiri
Assignee: Inigo Goiri
 Attachments: YARN-3534-1.patch, YARN-3534-2.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 YARN should be aware of the resource utilization of the nodes when scheduling 
 containers. For this, this task will implement the NodeResourceMonitor and 
 send this information to the Resource Manager in the heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3541) Add version info on timeline service / generic history web UI and RES API

Zhijie Shen created YARN-3541:
-

 Summary: Add version info on timeline service / generic history 
web UI and RES API
 Key: YARN-3541
 URL: https://issues.apache.org/jira/browse/YARN-3541
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3438) add a mode to replay MR job history files to the timeline service


[ 
https://issues.apache.org/jira/browse/YARN-3438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510142#comment-14510142
 ] 

Hadoop QA commented on YARN-3438:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12727769/YARN-3438.000.patch |
| Optional Tests | javac unit findbugs checkstyle javadoc |
| git revision | trunk / 0b3f895 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7486/console |


This message was automatically generated.

 add a mode to replay MR job history files to the timeline service
 -

 Key: YARN-3438
 URL: https://issues.apache.org/jira/browse/YARN-3438
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: YARN-3438.000.patch


 The subtask covers the work on top of YARN-3437 to add a mode to replay MR 
 job history files to the timeline service storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3438) add a mode to replay MR job history files to the timeline service

2015-04-23 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510150#comment-14510150
 ] 

Junping Du commented on YARN-3438:
--

Thanks [~sjlee0] for uploading a patch! Given 90% work are on MR side, let's 
move to MapReduce project and under umbrella of MAPREDUCE-6331.

 add a mode to replay MR job history files to the timeline service
 -

 Key: YARN-3438
 URL: https://issues.apache.org/jira/browse/YARN-3438
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: YARN-3438.000.patch


 The subtask covers the work on top of YARN-3437 to add a mode to replay MR 
 job history files to the timeline service storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3541) Add version info on timeline service / generic history web UI and RES API


 [ 
https://issues.apache.org/jira/browse/YARN-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3541:
--
Attachment: YARN-3541.1.patch

Upload a patch:

1. Include version info in /ws/v1/timeline of timeline API
2. Add the endpoint at /ws/v1/applicationhistory/about and show version info of 
generic history API
3. Add an about page of generic history service and show version info.
4. Add test cases correspondingly.

I've tried the patch locally. The web service and UI looks good.

 Add version info on timeline service / generic history web UI and RES API
 -

 Key: YARN-3541
 URL: https://issues.apache.org/jira/browse/YARN-3541
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3541.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3382) Some of UserMetricsInfo metrics are incorrectly set to root queue metrics


 [ 
https://issues.apache.org/jira/browse/YARN-3382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3382:
--
Fix Version/s: (was: 2.8.0)
   2.7.1

This seems like an important fix. I merged this into branch-2.7.

 Some of UserMetricsInfo metrics are incorrectly set to root queue metrics
 -

 Key: YARN-3382
 URL: https://issues.apache.org/jira/browse/YARN-3382
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.2.0, 2.3.0, 2.4.0, 2.5.0, 2.6.0
Reporter: Rohit Agarwal
Assignee: Rohit Agarwal
 Fix For: 2.7.1

 Attachments: YARN-3382.patch


 {{appsCompleted}}, {{appsPending}}, {{appsRunning}} etc. in 
 {{UserMetricsInfo}} are incorrectly set to the root queue's value instead of 
 the user's value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3351) AppMaster tracking URL is broken in HA


 [ 
https://issues.apache.org/jira/browse/YARN-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3351:
--
Fix Version/s: (was: 2.8.0)
   2.7.1

This seems like an important fix. I merged this into branch-2.7.

 AppMaster tracking URL is broken in HA
 --

 Key: YARN-3351
 URL: https://issues.apache.org/jira/browse/YARN-3351
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.7.1

 Attachments: YARN-3351.001.patch, YARN-3351.002.patch, 
 YARN-3351.003.patch


 After YARN-2713, the AppMaster link is broken in HA.  To repro 
 a) setup RM HA and ensure the first RM is not active,
 b) run a long sleep job and view the tracking url on the RM applications page
 The log and full stack trace is shown below
 {noformat}
 2015-02-05 20:47:43,478 WARN org.mortbay.log: 
 /proxy/application_1423182188062_0002/: java.net.BindException: Cannot assign 
 requested address
 {noformat}
 {noformat}
 java.net.BindException: Cannot assign requested address
   at java.net.PlainSocketImpl.socketBind(Native Method)
   at 
 java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:376)
   at java.net.Socket.bind(Socket.java:631)
   at java.net.Socket.init(Socket.java:423)
   at java.net.Socket.init(Socket.java:280)
   at 
 org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80)
   at 
 org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122)
   at 
 org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
   at 
 org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)
   at 
 org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
   at 
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
   at 
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:346)
   at 
 org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.proxyLink(WebAppProxyServlet.java:188)
   at 
 org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.doGet(WebAppProxyServlet.java:345)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3472) Possible leak in DelegationTokenRenewer#allTokens


 [ 
https://issues.apache.org/jira/browse/YARN-3472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3472:
--
Target Version/s: 2.7.1  (was: 2.8.0)

This seems like an important fix. I merged this into branch-2.7.

 Possible leak in DelegationTokenRenewer#allTokens 
 --

 Key: YARN-3472
 URL: https://issues.apache.org/jira/browse/YARN-3472
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Rohith
 Fix For: 2.7.1

 Attachments: 0001-YARN-3472.patch, 0002-YARN-3472.patch


 When old token is expiring and being removed, it's not removed from the 
 allTokens map, resulting in possible leak. 
 {code}
 if (t.token.getKind().equals(new Text(HDFS_DELEGATION_TOKEN))) {
   iter.remove();
   t.cancelTimer();
   LOG.info(Removed expiring token  + t);
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3472) Possible leak in DelegationTokenRenewer#allTokens


 [ 
https://issues.apache.org/jira/browse/YARN-3472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3472:
--
Target Version/s: 2.8.0  (was: 2.7.1)
   Fix Version/s: (was: 2.8.0)
  2.7.1

 Possible leak in DelegationTokenRenewer#allTokens 
 --

 Key: YARN-3472
 URL: https://issues.apache.org/jira/browse/YARN-3472
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Rohith
 Fix For: 2.7.1

 Attachments: 0001-YARN-3472.patch, 0002-YARN-3472.patch


 When old token is expiring and being removed, it's not removed from the 
 allTokens map, resulting in possible leak. 
 {code}
 if (t.token.getKind().equals(new Text(HDFS_DELEGATION_TOKEN))) {
   iter.remove();
   t.cancelTimer();
   LOG.info(Removed expiring token  + t);
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3534) Report node resource utilization


[ 
https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510205#comment-14510205
 ] 

Hadoop QA commented on YARN-3534:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 35s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 78  line(s) that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 29s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 34s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   5m 32s | The applied patch generated  
11  additional checkstyle issues. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 40s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   0m 21s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   2m  1s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-server-common. |
| {color:green}+1{color} | yarn tests |   5m 55s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  53m  2s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12727766/YARN-3534-2.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 0b3f895 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/7485/artifact/patchprocess/whitespace.txt
 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/7485/artifact/patchprocess/checkstyle-result-diff.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7485/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7485/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7485/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7485/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7485/testReport/ |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7485/console |


This message was automatically generated.

 Report node resource utilization
 

 Key: YARN-3534
 URL: https://issues.apache.org/jira/browse/YARN-3534
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Affects Versions: 2.7.0
Reporter: Inigo Goiri
Assignee: Inigo Goiri
 Attachments: YARN-3534-1.patch, YARN-3534-2.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 YARN should be aware of the resource utilization of the nodes when scheduling 
 containers. For this, this task will implement the NodeResourceMonitor and 
 send this information to the Resource Manager in the heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3390) Reuse TimelineCollectorManager for RM


[ 
https://issues.apache.org/jira/browse/YARN-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510053#comment-14510053
 ] 

Sangjin Lee commented on YARN-3390:
---

[~zjshen], you might want to check out [~djp]'s comments and my response in the 
other JIRA here: 
https://issues.apache.org/jira/browse/MAPREDUCE-6335?focusedCommentId=14508378page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14508378

I think those were small but useful changes. See this patch for the changes: 
https://issues.apache.org/jira/secure/attachment/12727521/YARN-3437.004.patch

It would be good to preserve those changes. Thanks!

 Reuse TimelineCollectorManager for RM
 -

 Key: YARN-3390
 URL: https://issues.apache.org/jira/browse/YARN-3390
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3390.1.patch, YARN-3390.2.patch


 RMTimelineCollector should have the context info of each app whose entity  
 has been put



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3516) killing ContainerLocalizer action doesn't take effect when private localizer receives FETCH_FAILURE status.


[ 
https://issues.apache.org/jira/browse/YARN-3516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510092#comment-14510092
 ] 

Hudson commented on YARN-3516:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7656 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7656/])
YARN-3516. killing ContainerLocalizer action doesn't take effect when (xgong: 
rev 0b3f8957a87ada1a275c9904b211fdbdcefafb02)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* hadoop-yarn-project/CHANGES.txt


 killing ContainerLocalizer action doesn't take effect when private localizer 
 receives FETCH_FAILURE status.
 ---

 Key: YARN-3516
 URL: https://issues.apache.org/jira/browse/YARN-3516
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3516.000.patch


 killing ContainerLocalizer action doesn't take effect when private localizer 
 receives FETCH_FAILURE status. This is a typo from YARN-3024. With YARN-3024, 
 ContainerLocalizer will be killed only if {{action}} is set to 
 {{LocalizerAction.DIE}}, calling {{response.setLocalizerAction}} will be 
 overwritten. This is also a regression from old code.
 Also it make sense to kill the ContainerLocalizer when FETCH_FAILURE 
 happened, because the container will send CLEANUP_CONTAINER_RESOURCES event 
 after localization failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream

2015-04-23 Thread Gera Shegalov (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510163#comment-14510163
 ] 

Gera Shegalov commented on YARN-2893:
-

Hi [~zxu], for me personally it's easier to review if you simply make the 
change, and upload a new patch. The additional benefit is that we'll see 
hopefully if our assumptions are validated by unit tests.

 AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
 --

 Key: YARN-2893
 URL: https://issues.apache.org/jira/browse/YARN-2893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Gera Shegalov
Assignee: zhihai xu
 Attachments: YARN-2893.000.patch, YARN-2893.001.patch, 
 YARN-2893.002.patch


 MapReduce jobs on our clusters experience sporadic failures due to corrupt 
 tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2498) Respect labels in preemption policy of capacity scheduler for inter-queue preemption


[ 
https://issues.apache.org/jira/browse/YARN-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510177#comment-14510177
 ] 

Hadoop QA commented on YARN-2498:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 34s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 5 new or modified test files. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 3  line(s) that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 29s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 31s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   5m 23s | The applied patch generated  4 
 additional checkstyle issues. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 16s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:red}-1{color} | yarn tests |  52m 10s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  92m 55s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12727748/YARN-2498.3.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / ac281e3 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/7482/artifact/patchprocess/whitespace.txt
 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/7482/artifact/patchprocess/checkstyle-result-diff.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7482/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7482/testReport/ |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7482/console |


This message was automatically generated.

 Respect labels in preemption policy of capacity scheduler for inter-queue 
 preemption
 

 Key: YARN-2498
 URL: https://issues.apache.org/jira/browse/YARN-2498
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: STALED-YARN-2498.zip, YARN-2498.1.patch, 
 YARN-2498.2.patch, YARN-2498.3.patch, YARN-2498.4.patch


 There're 3 stages in ProportionalCapacityPreemptionPolicy,
 # Recursively calculate {{ideal_assigned}} for queue. This is depends on 
 available resource, resource used/pending in each queue and guaranteed 
 capacity of each queue.
 # Mark to-be preempted containers: For each over-satisfied queue, it will 
 mark some containers will be preempted.
 # Notify scheduler about to-be preempted container.
 We need respect labels in the cluster for both #1 and #2:
 For #1, when calculating ideal_assigned for each queue, we need get 
 by-partition-ideal-assigned according to queue's 
 guaranteed/maximum/used/pending resource on specific partition.
 For #2, when we make decision about whether we need preempt a container, we 
 need make sure, resource this container is *possibly* usable by a queue which 
 is under-satisfied and has pending resource.
 In addition, we need to handle ignore_partition_exclusivity case, when we 
 need to preempt containers from a queue's partition, we will first preempt 
 ignore_partition_exclusivity allocated containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-3458) CPU resource monitoring in Windows

2015-04-23 Thread Inigo Goiri (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri reassigned YARN-3458:
-

Assignee: Inigo Goiri

 CPU resource monitoring in Windows
 --

 Key: YARN-3458
 URL: https://issues.apache.org/jira/browse/YARN-3458
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Affects Versions: 2.7.0
 Environment: Windows
Reporter: Inigo Goiri
Assignee: Inigo Goiri
Priority: Minor
  Labels: containers, metrics, windows
 Attachments: YARN-3458-1.patch, YARN-3458-2.patch, YARN-3458-3.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 The current implementation of getCpuUsagePercent() for 
 WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to 
 do it. I reused the CpuTimeTracker using 1 jiffy=1ms.
 This was left open by YARN-3122.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3541) Add version info on timeline service / generic history web UI and RES API


[ 
https://issues.apache.org/jira/browse/YARN-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510216#comment-14510216
 ] 

Hadoop QA commented on YARN-3541:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 37s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 31s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 35s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   7m 43s | The applied patch generated  5 
 additional checkstyle issues. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 47s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   2m 49s | Tests passed in 
hadoop-yarn-server-applicationhistoryservice. |
| | |  45m 35s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12727771/YARN-3541.1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / bcf89dd |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/7487/artifact/patchprocess/checkstyle-result-diff.txt
 |
| hadoop-yarn-server-applicationhistoryservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7487/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7487/testReport/ |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7487/console |


This message was automatically generated.

 Add version info on timeline service / generic history web UI and RES API
 -

 Key: YARN-3541
 URL: https://issues.apache.org/jira/browse/YARN-3541
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3541.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2408) Resource Request REST API for YARN


[ 
https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509106#comment-14509106
 ] 

Renan DelValle commented on YARN-2408:
--

Hi Nikhil,

While I would be glad to finish the development of this feature, the fact is 
that since being proposed on August 12, 2014 (more than 8 months ago), no 
member of the Hadoop team has shown an interest in including this feature as 
part the main software. Thus, to use this feature would mean always having to 
patch the Hadoop source intended for use and hoping that nothing is broken in 
future versions. 

As Adam pointed out, alternative solutions exist which may allow you to achieve 
this feature with a much more future-proof and painless solution, such as the 
approach Myriad takes (https://github.com/mesos/myriad).

That having been said, I'd gladly release the source code for what I have 
working. As for me, unfortunately, at this time, I don't feel like it is within 
my best interests to put forth the time necessary to complete this feature.

-Renan

 Resource Request REST API for YARN
 --

 Key: YARN-2408
 URL: https://issues.apache.org/jira/browse/YARN-2408
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: webapp
Reporter: Renan DelValle
  Labels: features

 I’m proposing a new REST API for YARN which exposes a snapshot of the 
 Resource Requests that exist inside of the Scheduler. My motivation behind 
 this new feature is to allow external software to monitor the amount of 
 resources being requested to gain more insightful information into cluster 
 usage than is already provided. The API can also be used by external software 
 to detect a starved application and alert the appropriate users and/or sys 
 admin so that the problem may be remedied.
 Here is the proposed API (a JSON counterpart is also available):
 {code:xml}
 resourceRequests
   MB7680/MB
   VCores7/VCores
   appMaster
 applicationIdapplication_1412191664217_0001/applicationId
 
 applicationAttemptIdappattempt_1412191664217_0001_01/applicationAttemptId
 queueNamedefault/queueName
 totalMB6144/totalMB
 totalVCores6/totalVCores
 numResourceRequests3/numResourceRequests
 requests
   request
 MB1024/MB
 VCores1/VCores
 numContainers6/numContainers
 relaxLocalitytrue/relaxLocality
 priority20/priority
 resourceNames
   resourceNamelocalMachine/resourceName
   resourceName/default-rack/resourceName
   resourceName*/resourceName
 /resourceNames
   /request
 /requests
   /appMaster
   appMaster
   ...
   /appMaster
 /resourceRequests
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3477) TimelineClientImpl swallows exceptions


 [ 
https://issues.apache.org/jira/browse/YARN-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-3477:
-
Affects Version/s: 2.6.0
  Summary: TimelineClientImpl swallows exceptions  (was: 
TimelineClientImpl swallows root cause of retry failures)

{{TimelineClientImpl}} also catches InterruptedExceptions and either
# converts to IOE  so making it potentially treated as a retry
# during a sleep(), it will catch and discard.

Issue #2 means it is impossible to to reliably interrupt a thread which is in 
the attempt-and-retry process of trying to talk to a non-responsive ATS 
instance. While this does not impact normal operations, it does make it hard to 
shut down threads talking to ATS

 TimelineClientImpl swallows exceptions
 --

 Key: YARN-3477
 URL: https://issues.apache.org/jira/browse/YARN-3477
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.6.0, 2.7.0
Reporter: Steve Loughran
Assignee: Steve Loughran

 If timeline client fails more than the retry count, the original exception is 
 not thrown. Instead some runtime exception is raised saying retries run out
 # the failing exception should be rethrown, ideally via 
 NetUtils.wrapException to include URL of the failing endpoing
 # Otherwise, the raised RTE should (a) state that URL and (b) set the 
 original fault as the inner cause



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3431) Sub resources of timeline entity needs to be passed to a separate endpoint.


 [ 
https://issues.apache.org/jira/browse/YARN-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3431:
--
Attachment: YARN-3431.6.patch

 Sub resources of timeline entity needs to be passed to a separate endpoint.
 ---

 Key: YARN-3431
 URL: https://issues.apache.org/jira/browse/YARN-3431
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3431.1.patch, YARN-3431.2.patch, YARN-3431.3.patch, 
 YARN-3431.4.patch, YARN-3431.5.patch, YARN-3431.6.patch


 We have TimelineEntity and some other entities as subclass that inherit from 
 it. However, we only have a single endpoint, which consume TimelineEntity 
 rather than sub-classes and this endpoint will check the incoming request 
 body contains exactly TimelineEntity object. However, the json data which is 
 serialized from sub-class object seems not to be treated as an TimelineEntity 
 object, and won't be deserialized into the corresponding sub-class object 
 which cause deserialization failure as some discussions in YARN-3334 : 
 https://issues.apache.org/jira/browse/YARN-3334?focusedCommentId=14391059page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14391059.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3431) Sub resources of timeline entity needs to be passed to a separate endpoint.


[ 
https://issues.apache.org/jira/browse/YARN-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508529#comment-14508529
 ] 

Zhijie Shen commented on YARN-3431:
---

bq. It would be a little more consistent and perform slightly better if the 
type check in getChildren() is consolidated into validateChildren().

Refactored the code, such that we don't iterate the set twice.

bq. maybe we'd like to add some prefix to the fields we (implicitly) add to the 
info field of an entity?

I changed the info keys a bit to make them start with SYSTEM_INFO_. Hopefully 
it will reduce the conflict. Anyway, we need to identify the system info keys 
in the documentation to notify users of not using them.

 Sub resources of timeline entity needs to be passed to a separate endpoint.
 ---

 Key: YARN-3431
 URL: https://issues.apache.org/jira/browse/YARN-3431
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3431.1.patch, YARN-3431.2.patch, YARN-3431.3.patch, 
 YARN-3431.4.patch, YARN-3431.5.patch, YARN-3431.6.patch


 We have TimelineEntity and some other entities as subclass that inherit from 
 it. However, we only have a single endpoint, which consume TimelineEntity 
 rather than sub-classes and this endpoint will check the incoming request 
 body contains exactly TimelineEntity object. However, the json data which is 
 serialized from sub-class object seems not to be treated as an TimelineEntity 
 object, and won't be deserialized into the corresponding sub-class object 
 which cause deserialization failure as some discussions in YARN-3334 : 
 https://issues.apache.org/jira/browse/YARN-3334?focusedCommentId=14391059page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14391059.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3522) DistributedShell uses the wrong user to put timeline data


[ 
https://issues.apache.org/jira/browse/YARN-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508557#comment-14508557
 ] 

Zhijie Shen commented on YARN-3522:
---

I took a look at checkstyle errors and commented on 
[HADOOP-11869|https://issues.apache.org/jira/browse/HADOOP-11869?focusedCommentId=14508555page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14508555].
 It seems more like noise now.

 DistributedShell uses the wrong user to put timeline data
 -

 Key: YARN-3522
 URL: https://issues.apache.org/jira/browse/YARN-3522
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Attachments: YARN-3522.1.patch, YARN-3522.2.patch, YARN-3522.3.patch


 YARN-3287 breaks the timeline access control of distributed shell. In 
 distributed shell AM:
 {code}
 if (conf.getBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED,
   YarnConfiguration.DEFAULT_TIMELINE_SERVICE_ENABLED)) {
   // Creating the Timeline Client
   timelineClient = TimelineClient.createTimelineClient();
   timelineClient.init(conf);
   timelineClient.start();
 } else {
   timelineClient = null;
   LOG.warn(Timeline service is not enabled);
 }
 {code}
 {code}
   ugi.doAs(new PrivilegedExceptionActionTimelinePutResponse() {
 @Override
 public TimelinePutResponse run() throws Exception {
   return timelineClient.putEntities(entity);
 }
   });
 {code}
 YARN-3287 changes the timeline client to get the right ugi at serviceInit, 
 but DS AM still doesn't use submitter ugi to init timeline client, but use 
 the ugi for each put entity call. It result in the wrong user of the put 
 request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2408) Resource Request REST API for YARN

2015-04-23 Thread Nikhil Mulley (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508586#comment-14508586
 ] 

Nikhil Mulley commented on YARN-2408:
-

Hi [~rdelvalle]

There are 8 people voting for it and 15 people watching this issue. I am not 
sure what is the requirement in the community for having a general interest 
though but I would be happy to help this move forward in terms of having the 
patch deployed on my test cluster and give it a whirl and see where it goes.

I am as well interested in the rest api to provide means to monitor the cluster 
resources, in general, to have a means to monitor the slow/starving jobs and 
the resources requested/consumed per app/job via rest api.

Nikhil

 Resource Request REST API for YARN
 --

 Key: YARN-2408
 URL: https://issues.apache.org/jira/browse/YARN-2408
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: webapp
Reporter: Renan DelValle
  Labels: features

 I’m proposing a new REST API for YARN which exposes a snapshot of the 
 Resource Requests that exist inside of the Scheduler. My motivation behind 
 this new feature is to allow external software to monitor the amount of 
 resources being requested to gain more insightful information into cluster 
 usage than is already provided. The API can also be used by external software 
 to detect a starved application and alert the appropriate users and/or sys 
 admin so that the problem may be remedied.
 Here is the proposed API (a JSON counterpart is also available):
 {code:xml}
 resourceRequests
   MB7680/MB
   VCores7/VCores
   appMaster
 applicationIdapplication_1412191664217_0001/applicationId
 
 applicationAttemptIdappattempt_1412191664217_0001_01/applicationAttemptId
 queueNamedefault/queueName
 totalMB6144/totalMB
 totalVCores6/totalVCores
 numResourceRequests3/numResourceRequests
 requests
   request
 MB1024/MB
 VCores1/VCores
 numContainers6/numContainers
 relaxLocalitytrue/relaxLocality
 priority20/priority
 resourceNames
   resourceNamelocalMachine/resourceName
   resourceName/default-rack/resourceName
   resourceName*/resourceName
 /resourceNames
   /request
 /requests
   /appMaster
   appMaster
   ...
   /appMaster
 /resourceRequests
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-2408) Resource Request REST API for YARN


 [ 
https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle resolved YARN-2408.
--
Resolution: Done

 Resource Request REST API for YARN
 --

 Key: YARN-2408
 URL: https://issues.apache.org/jira/browse/YARN-2408
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: webapp
Reporter: Renan DelValle
  Labels: features

 I’m proposing a new REST API for YARN which exposes a snapshot of the 
 Resource Requests that exist inside of the Scheduler. My motivation behind 
 this new feature is to allow external software to monitor the amount of 
 resources being requested to gain more insightful information into cluster 
 usage than is already provided. The API can also be used by external software 
 to detect a starved application and alert the appropriate users and/or sys 
 admin so that the problem may be remedied.
 Here is the proposed API (a JSON counterpart is also available):
 {code:xml}
 resourceRequests
   MB7680/MB
   VCores7/VCores
   appMaster
 applicationIdapplication_1412191664217_0001/applicationId
 
 applicationAttemptIdappattempt_1412191664217_0001_01/applicationAttemptId
 queueNamedefault/queueName
 totalMB6144/totalMB
 totalVCores6/totalVCores
 numResourceRequests3/numResourceRequests
 requests
   request
 MB1024/MB
 VCores1/VCores
 numContainers6/numContainers
 relaxLocalitytrue/relaxLocality
 priority20/priority
 resourceNames
   resourceNamelocalMachine/resourceName
   resourceName/default-rack/resourceName
   resourceName*/resourceName
 /resourceNames
   /request
 /requests
   /appMaster
   appMaster
   ...
   /appMaster
 /resourceRequests
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2408) Resource Request REST API for YARN

2015-04-23 Thread Adam B (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508593#comment-14508593
 ] 

Adam B commented on YARN-2408:
--

FYI, one of the original use cases (Myriad, to run YARN on Mesos) now just 
implements the YARN scheduler API directly, so it no longer needs a REST API 
for resource requests. Other tools may be able to take a similar approach of 
wrapping a traditional YARN scheduler, but that means that the tool is forced 
to live on the RM node, in-process. Some tools (especially non-Java tools) will 
not be able to take this approach.

 Resource Request REST API for YARN
 --

 Key: YARN-2408
 URL: https://issues.apache.org/jira/browse/YARN-2408
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: webapp
Reporter: Renan DelValle
  Labels: features

 I’m proposing a new REST API for YARN which exposes a snapshot of the 
 Resource Requests that exist inside of the Scheduler. My motivation behind 
 this new feature is to allow external software to monitor the amount of 
 resources being requested to gain more insightful information into cluster 
 usage than is already provided. The API can also be used by external software 
 to detect a starved application and alert the appropriate users and/or sys 
 admin so that the problem may be remedied.
 Here is the proposed API (a JSON counterpart is also available):
 {code:xml}
 resourceRequests
   MB7680/MB
   VCores7/VCores
   appMaster
 applicationIdapplication_1412191664217_0001/applicationId
 
 applicationAttemptIdappattempt_1412191664217_0001_01/applicationAttemptId
 queueNamedefault/queueName
 totalMB6144/totalMB
 totalVCores6/totalVCores
 numResourceRequests3/numResourceRequests
 requests
   request
 MB1024/MB
 VCores1/VCores
 numContainers6/numContainers
 relaxLocalitytrue/relaxLocality
 priority20/priority
 resourceNames
   resourceNamelocalMachine/resourceName
   resourceName/default-rack/resourceName
   resourceName*/resourceName
 /resourceNames
   /request
 /requests
   /appMaster
   appMaster
   ...
   /appMaster
 /resourceRequests
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3405) FairScheduler's preemption cannot happen between sibling in some case

2015-04-23 Thread Peng Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508536#comment-14508536
 ] 

Peng Zhang commented on YARN-3405:
--

Update patch: only preempt from children when queue is not starved and add test 
case.

 FairScheduler's preemption cannot happen between sibling in some case
 -

 Key: YARN-3405
 URL: https://issues.apache.org/jira/browse/YARN-3405
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: Peng Zhang
Assignee: Peng Zhang
Priority: Critical
 Attachments: YARN-3405.01.patch, YARN-3405.02.patch


 Queue hierarchy described as below:
 {noformat}
   root
/ \
queue-1  queue-2   
   /  \
 queue-1-1 queue-1-2
 {noformat}
 Assume cluster resource is 100
 # queue-1-1 and queue-2 has app. Each get 50 usage and 50 fairshare. 
 # When queue-1-2 is active, and it cause some new preemption request for 
 fairshare 25.
 # When preemption from root, it has possibility to find preemption candidate 
 is queue-2. If so preemptContainerPreCheck for queue-2 return false because 
 it's equal to its fairshare.
 # Finally queue-1-2 will be waiting for resource release form queue-1-1 
 itself.
 What I expect here is that queue-1-2 preempt from queue-1-1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-2408) Resource Request REST API for YARN


 [ 
https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle resolved YARN-2408.
--
Resolution: Won't Fix

Work on this feature has been dropped by original author due to general lack of 
interest.

 Resource Request REST API for YARN
 --

 Key: YARN-2408
 URL: https://issues.apache.org/jira/browse/YARN-2408
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: webapp
Reporter: Renan DelValle
  Labels: features

 I’m proposing a new REST API for YARN which exposes a snapshot of the 
 Resource Requests that exist inside of the Scheduler. My motivation behind 
 this new feature is to allow external software to monitor the amount of 
 resources being requested to gain more insightful information into cluster 
 usage than is already provided. The API can also be used by external software 
 to detect a starved application and alert the appropriate users and/or sys 
 admin so that the problem may be remedied.
 Here is the proposed API (a JSON counterpart is also available):
 {code:xml}
 resourceRequests
   MB7680/MB
   VCores7/VCores
   appMaster
 applicationIdapplication_1412191664217_0001/applicationId
 
 applicationAttemptIdappattempt_1412191664217_0001_01/applicationAttemptId
 queueNamedefault/queueName
 totalMB6144/totalMB
 totalVCores6/totalVCores
 numResourceRequests3/numResourceRequests
 requests
   request
 MB1024/MB
 VCores1/VCores
 numContainers6/numContainers
 relaxLocalitytrue/relaxLocality
 priority20/priority
 resourceNames
   resourceNamelocalMachine/resourceName
   resourceName/default-rack/resourceName
   resourceName*/resourceName
 /resourceNames
   /request
 /requests
   /appMaster
   appMaster
   ...
   /appMaster
 /resourceRequests
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (YARN-2408) Resource Request REST API for YARN