[jira] [Commented] (YARN-2453) TestProportionalCapacityPreemptionPolicy is failed for FairScheduler

2014-09-22 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142931#comment-14142931
 ] 

Karthik Kambatla commented on YARN-2453:


The latest patch looks good. +1. Checking this in.

 TestProportionalCapacityPreemptionPolicy is failed for FairScheduler
 

 Key: YARN-2453
 URL: https://issues.apache.org/jira/browse/YARN-2453
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2453.000.patch, YARN-2453.001.patch, 
 YARN-2453.002.patch


 TestProportionalCapacityPreemptionPolicy is failed for FairScheduler.
 The following is error message:
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 Tests run: 18, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.94 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 testPolicyInitializeAfterSchedulerInitialized(org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy)
   Time elapsed: 1.61 sec   FAILURE!
 java.lang.AssertionError: Failed to find SchedulingMonitor service, please 
 check what happened
   at org.junit.Assert.fail(Assert.java:88)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy.testPolicyInitializeAfterSchedulerInitialized(TestProportionalCapacityPreemptionPolicy.java:469)
 This test should only work for capacity scheduler because the following 
 source code in ResourceManager.java prove it will only work for capacity 
 scheduler.
 {code}
 if (scheduler instanceof PreemptableResourceScheduler
conf.getBoolean(YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS,
   YarnConfiguration.DEFAULT_RM_SCHEDULER_ENABLE_MONITORS)) {
 {code}
 Because CapacityScheduler is instance of PreemptableResourceScheduler and 
 FairScheduler is not  instance of PreemptableResourceScheduler.
 I will upload a patch to fix this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2453) TestProportionalCapacityPreemptionPolicy fails with FairScheduler

2014-09-22 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2453:
---
Summary: TestProportionalCapacityPreemptionPolicy fails with FairScheduler  
(was: TestProportionalCapacityPreemptionPolicy is failed for FairScheduler)

 TestProportionalCapacityPreemptionPolicy fails with FairScheduler
 -

 Key: YARN-2453
 URL: https://issues.apache.org/jira/browse/YARN-2453
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2453.000.patch, YARN-2453.001.patch, 
 YARN-2453.002.patch


 TestProportionalCapacityPreemptionPolicy is failed for FairScheduler.
 The following is error message:
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 Tests run: 18, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.94 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 testPolicyInitializeAfterSchedulerInitialized(org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy)
   Time elapsed: 1.61 sec   FAILURE!
 java.lang.AssertionError: Failed to find SchedulingMonitor service, please 
 check what happened
   at org.junit.Assert.fail(Assert.java:88)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy.testPolicyInitializeAfterSchedulerInitialized(TestProportionalCapacityPreemptionPolicy.java:469)
 This test should only work for capacity scheduler because the following 
 source code in ResourceManager.java prove it will only work for capacity 
 scheduler.
 {code}
 if (scheduler instanceof PreemptableResourceScheduler
conf.getBoolean(YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS,
   YarnConfiguration.DEFAULT_RM_SCHEDULER_ENABLE_MONITORS)) {
 {code}
 Because CapacityScheduler is instance of PreemptableResourceScheduler and 
 FairScheduler is not  instance of PreemptableResourceScheduler.
 I will upload a patch to fix this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

2014-09-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143017#comment-14143017
 ] 

Hadoop QA commented on YARN-2198:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12670374/YARN-2198.trunk.9.patch
  against trunk revision 9721e2c.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5070//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5070//console

This message is automatically generated.

 Remove the need to run NodeManager as privileged account for Windows Secure 
 Container Executor
 --

 Key: YARN-2198
 URL: https://issues.apache.org/jira/browse/YARN-2198
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: YARN-2198.1.patch, YARN-2198.2.patch, YARN-2198.3.patch, 
 YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, 
 YARN-2198.delta.7.patch, YARN-2198.separation.patch, YARN-2198.trunk.4.patch, 
 YARN-2198.trunk.5.patch, YARN-2198.trunk.6.patch, YARN-2198.trunk.8.patch, 
 YARN-2198.trunk.9.patch


 YARN-1972 introduces a Secure Windows Container Executor. However this 
 executor requires a the process launching the container to be LocalSystem or 
 a member of the a local Administrators group. Since the process in question 
 is the NodeManager, the requirement translates to the entire NM to run as a 
 privileged account, a very large surface area to review and protect.
 This proposal is to move the privileged operations into a dedicated NT 
 service. The NM can run as a low privilege account and communicate with the 
 privileged NT service when it needs to launch a container. This would reduce 
 the surface exposed to the high privileges. 
 There has to exist a secure, authenticated and authorized channel of 
 communication between the NM and the privileged NT service. Possible 
 alternatives are a new TCP endpoint, Java RPC etc. My proposal though would 
 be to use Windows LPC (Local Procedure Calls), which is a Windows platform 
 specific inter-process communication channel that satisfies all requirements 
 and is easy to deploy. The privileged NT service would register and listen on 
 an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop 
 with libwinutils which would host the LPC client code. The client would 
 connect to the LPC port (NtConnectPort) and send a message requesting a 
 container launch (NtRequestWaitReplyPort). LPC provides authentication and 
 the privileged NT service can use authorization API (AuthZ) to validate the 
 caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2498) [YARN-796] Respect labels in preemption policy of capacity scheduler

2014-09-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143019#comment-14143019
 ] 

Wangda Tan commented on YARN-2498:
--

Hi [~sunilg],
Many thanks for reviewing this patch, feedbacks:

1)
bq. A scenario where node1 has more than 50% (say 60) of cluster resources, and 
queue A is given 50% in CS. IN that case, is there any chance of under 
utilization?
Yes, queue-A can be under utilization. By design of YARN-796, this is 
acceptable. Now we will calculate realtime maximum resource can be accessed by 
each queue, and user/admin can get warning of queue under utilization from web 
UI - scheduler page.

2)
bq. Here I feel, we may need to split up the resource of label in each node 
level.
It's a very good question, I just thought this for a while again. I found a 
negtive example shows you're right:
{code}
node1: x,y
node2: x,y
node3: z

each node has resource 10,
resource tree: 
 total = 30
/|\
   20x   20y  10z

First request 20 resource with label = x
resource tree: 
 total = 10
/|\
   0x   20y  10z

The correct result should be, y = 0, we cannot request resource with label=y.
{code}
So it's best to split up the resource of label to node level, but the problem 
is, it will have much larger time complexity. For each assign operation, we 
need O(n=#unique-set-of-labels-on-node). It can be very large in a big cluster. 
And considering m=#iteration and p=#leaf-queue, we need O(n * m * p) to get the 
ideal_assigned of each queue.
It may have better way to calculate ideal_assigned, I will think about this. 
For now, it can only get correct ideal_assigned when all node in the cluster 
has = 1 label. It's the hard-partition use-case (cluster is partitioned to 
several smaller clusters by label).

3)
bq. For preemption, we just calculate to match the totalResourceToPreempt from 
the over utilized queues. But whether this container is from which node, and 
also under which label, and whether this label is coming under which queue. Do 
we need to do this check for each container?
I think the answer is yes if we want: every container preempted can be accessed 
by at least one queue under-satisfied (has ideal_assigned  current).

Please let me know if you have more comments,

Thanks,
Wangda

 [YARN-796] Respect labels in preemption policy of capacity scheduler
 

 Key: YARN-2498
 URL: https://issues.apache.org/jira/browse/YARN-2498
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2498.patch, YARN-2498.patch, YARN-2498.patch, 
 yarn-2498-implementation-notes.pdf


 There're 3 stages in ProportionalCapacityPreemptionPolicy,
 # Recursively calculate {{ideal_assigned}} for queue. This is depends on 
 available resource, resource used/pending in each queue and guaranteed 
 capacity of each queue.
 # Mark to-be preempted containers: For each over-satisfied queue, it will 
 mark some containers will be preempted.
 # Notify scheduler about to-be preempted container.
 We need respect labels in the cluster for both #1 and #2:
 For #1, when there're some resource available in the cluster, we shouldn't 
 assign it to a queue (by increasing {{ideal_assigned}}) if the queue cannot 
 access such labels
 For #2, when we make decision about whether we need preempt a container, we 
 need make sure, resource this container is *possibly* usable by a queue which 
 is under-satisfied and has pending resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1530) [Umbrella] Store, manage and serve per-framework application-timeline data

2014-09-22 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143034#comment-14143034
 ] 

Zhijie Shen commented on YARN-1530:
---

bq. Scenario 1. ATS service goes down
bq. Scenario 2. ATS service partially down

In general, I agree the concerns about the scenario when the timeline server is 
(partially) down makes sense. However, if we change the subject from ATS to 
HDFS/Kafka, I'm afraid we can get the similar conclusion. For example, HDFS is 
temporally not writable (We actually have observed this issue around YARN log 
aggregation). I can see the judgement has a obvious implication that the 
timeline server can be down, but HDFS/Kafka will not. It's correct to some 
extent base on the current timeline server SLA. Therefore, is making the 
timeline server reliable (or always-up) the essential solution? If the timeline 
server is reliable, it's going to relax the requirement to persist entities in 
a third place (this is the basic benefit I can see with HDFS/Kafka channel). 
While it may take a while to make sure the timeline server be as reliable as 
HDFS/Kafka does, we can make progress step by step, for example, YARN-2520 
should realistic to be achieved within a reasonable timeline.

Of course, there may still be a reliability gap between ATS/HBase and 
HDFS/Kafka (Actually, I'm not experienced about the reliability about the 
latter components, please let me know the exact gap it will be). It could be 
arguable that we still need to persist the entities in HDFS/Kafka when 
ATS/HBase is not available but HDFS/Kafka is still available. However, if we 
anyway need to improve the timeline server reliability, perhaps we should think 
carefully of the cost performance of implementing and maintaing another writing 
channel to bridge the gap.

bq. Scenario 3. ATS backing store fails

In this scenario, the entities have already reached the timeline server, right? 
I'm considering it as the internal reliability problem of the timeline server. 
As I mentioned the previous threads, it's the requirement that if the entity 
has reached the timeline server: the timeline server should take the 
responsibility to prevent if from being lost. I think it's a good point that 
the date store is going to be in outage (as HDFS can be temporally not 
writable). Having local backup for those outstanding received requests should 
be an answer for this scenario.

bq. However, with the HDFS channel, the ATS can essentially throttle the events 
Suppose you have a cluster pushing X events/second to the ATS. With the REST 
implementation, the ATS must try to handle X events every second; if it can’t 
keep up, or if it gets too many incoming connections, there’s not too much we 
can do here. 

This may not be the accurate judgement. I'm supposing you are comparing pushing 
each event in on request for REST API with writing a batch of X events into 
HDFS. REST API allows to to batch X events and send one request. Please refer 
to TimelineClient#putEntities for the details.

bq. In making the write path pluggable, we’d have to have two pieces: one to do 
the writing from the TimelineClient and one to the receiving in the ATS. These 
would have to be in pairs. We’ve already discussed some different 
implementations for this: REST, Kafka, and HDFS.
bq. The backing store is already pluggable. 

No problem, it's feasible to make the write path pluggable. However. though the 
store is pluggable, Leveldb an HBase is relatively similar to each compared 
HTTP REST vs HDFS/Kafka pair. The more important thing is that it's more 
difficult to manage different writing channels than to manage different stores, 
because one is client-side and the other is server-side. At server-side, the 
YARN cluster operator has the full control of the servers, and the limited 
hosts to deal with. At client-side, the YARN cluster operator may not have the 
access to it, and don't know how many clients and how many type of apps he/she 
needs to deal with. TimelineClient is a generic tool (not for a particular 
application such as Spark), such that it's good to make it lightweight and 
portable. Again, it's still a cost performance question.

bq.  Though as bc pointed out before, it’s fine for more experienced users to 
use HBase, but “regular” users should have a solution as well that is hopefully 
more scalable and reliable than LevelDB. 

Right, and this is also my concern about HDFS/Kafka channel, in particularly 
using it as a default. Regular users may not be experienced enough about 
HBase as well as HDFS/Kafka. It very much depends on the users and the use 
cases.

[~bcwalrus] and [~rkanter], thanks for putting new idea into the timeline 
service. In general, the timeline service is still a young project. We have 
different problems to solve and multiple ways to them. Additional writing 
channel is interesting, while given the whole roadmap 

[jira] [Commented] (YARN-2556) Tool to measure the performance of the timeline server

2014-09-22 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143039#comment-14143039
 ] 

Zhijie Shen commented on YARN-2556:
---

[~jeagles], it sounds an interesting work. Is it possible to see the throughput 
difference between TimelineDataManager and the web front interface? I suspect 
the web front interface is going to be bottleneck to throttle the end-to-end 
performance. With this analysis, we can have clearer picture about the 
reasonable to timeline server instances required to get rid of web font 
interface bottleneck (YARN-2520).

 Tool to measure the performance of the timeline server
 --

 Key: YARN-2556
 URL: https://issues.apache.org/jira/browse/YARN-2556
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: chang li

 We need to be able to understand the capacity model for the timeline server 
 to give users the tools they need to deploy a timeline server with the 
 correct capacity.
 I propose we create a mapreduce job that can measure timeline server write 
 and read performance. Transactions per second, I/O for both read and write 
 would be a good start.
 This could be done as an example or test job that could be tied into gridmix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2452) TestRMApplicationHistoryWriter fails with FairScheduler

2014-09-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143100#comment-14143100
 ] 

Hudson commented on YARN-2452:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #688 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/688/])
YARN-2452. TestRMApplicationHistoryWriter fails with FairScheduler. (Zhihai Xu 
via kasha) (kasha: rev c50fc92502934aa2a8f84ea2466d4da1e3eace9d)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/TestRMApplicationHistoryWriter.java


 TestRMApplicationHistoryWriter fails with FairScheduler
 ---

 Key: YARN-2452
 URL: https://issues.apache.org/jira/browse/YARN-2452
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 2.6.0

 Attachments: YARN-2452.000.patch, YARN-2452.001.patch, 
 YARN-2452.002.patch


 TestRMApplicationHistoryWriter is failed for FairScheduler. The failure is 
 the following:
 T E S T S
 ---
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter
 Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 69.311 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter
 testRMWritingMassiveHistory(org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter)
   Time elapsed: 66.261 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:200
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:430)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:391)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2453) TestProportionalCapacityPreemptionPolicy fails with FairScheduler

2014-09-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143099#comment-14143099
 ] 

Hudson commented on YARN-2453:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #688 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/688/])
YARN-2453. TestProportionalCapacityPreemptionPolicy fails with FairScheduler. 
(Zhihai Xu via kasha) (kasha: rev 9721e2c1feb5aecea3a6dab5bda96af1cd0f8de3)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java


 TestProportionalCapacityPreemptionPolicy fails with FairScheduler
 -

 Key: YARN-2453
 URL: https://issues.apache.org/jira/browse/YARN-2453
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 2.6.0

 Attachments: YARN-2453.000.patch, YARN-2453.001.patch, 
 YARN-2453.002.patch


 TestProportionalCapacityPreemptionPolicy is failed for FairScheduler.
 The following is error message:
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 Tests run: 18, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.94 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 testPolicyInitializeAfterSchedulerInitialized(org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy)
   Time elapsed: 1.61 sec   FAILURE!
 java.lang.AssertionError: Failed to find SchedulingMonitor service, please 
 check what happened
   at org.junit.Assert.fail(Assert.java:88)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy.testPolicyInitializeAfterSchedulerInitialized(TestProportionalCapacityPreemptionPolicy.java:469)
 This test should only work for capacity scheduler because the following 
 source code in ResourceManager.java prove it will only work for capacity 
 scheduler.
 {code}
 if (scheduler instanceof PreemptableResourceScheduler
conf.getBoolean(YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS,
   YarnConfiguration.DEFAULT_RM_SCHEDULER_ENABLE_MONITORS)) {
 {code}
 Because CapacityScheduler is instance of PreemptableResourceScheduler and 
 FairScheduler is not  instance of PreemptableResourceScheduler.
 I will upload a patch to fix this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2551) Windows Secure Cotnainer Executor: Add checks to validate that the wsce-site.xml is write restricted to Administrators only

2014-09-22 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated YARN-2551:
---
Attachment: YARN-2551.1.patch

 Windows Secure Cotnainer Executor: Add checks to validate that the 
 wsce-site.xml is write restricted to Administrators only
 ---

 Key: YARN-2551
 URL: https://issues.apache.org/jira/browse/YARN-2551
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows, wsce
 Attachments: YARN-2551.1.patch


 The wsce-site.xml containes the impersonate.allowed and impersonate.denied 
 keys that restrict/control the users that can be impersonated by the WSCE 
 containers. The impersonation frameworks in winutils should validate that 
 only Administrators have write control on this file. 
 This is similar to how LCE is validating that only root has write permissions 
 on container-executor.cfg file on secure Linux clusters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2551) Windows Secure Cotnainer Executor: Add checks to validate that the wsce-site.xml is write restricted to Administrators only

2014-09-22 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu resolved YARN-2551.

Resolution: Implemented

The patch will be contained in YARN-2198 patch 10 and forward

 Windows Secure Cotnainer Executor: Add checks to validate that the 
 wsce-site.xml is write restricted to Administrators only
 ---

 Key: YARN-2551
 URL: https://issues.apache.org/jira/browse/YARN-2551
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows, wsce
 Attachments: YARN-2551.1.patch


 The wsce-site.xml containes the impersonate.allowed and impersonate.denied 
 keys that restrict/control the users that can be impersonated by the WSCE 
 containers. The impersonation frameworks in winutils should validate that 
 only Administrators have write control on this file. 
 This is similar to how LCE is validating that only root has write permissions 
 on container-executor.cfg file on secure Linux clusters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2312) Marking ContainerId#getId as deprecated

2014-09-22 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143158#comment-14143158
 ] 

Tsuyoshi OZAWA commented on YARN-2312:
--

[~vinodkv], [~jianhe] do you have any feedbacks?

[~jlowe], I appreciate if you give us comments about WrappedJvmID. 

 Marking ContainerId#getId as deprecated
 ---

 Key: YARN-2312
 URL: https://issues.apache.org/jira/browse/YARN-2312
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2312-wip.patch


 {{ContainerId#getId}} will only return partial value of containerId, only 
 sequence number of container id without epoch, after YARN-2229. We should 
 mark {{ContainerId#getId}} as deprecated and use 
 {{ContainerId#getContainerId}} instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2579) Both RM's state is Active , but 1 RM is not really active.

2014-09-22 Thread Rohith (JIRA)
Rohith created YARN-2579:


 Summary: Both RM's state is Active , but 1 RM is not really active.
 Key: YARN-2579
 URL: https://issues.apache.org/jira/browse/YARN-2579
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Rohith


I encountered a situaltion where both RM's web page was able to access and its 
state displayed as Active. But One of the RM's ActiveServices were stopped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2579) Both RM's state is Active , but 1 RM is not really active.

2014-09-22 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143200#comment-14143200
 ] 

Rohith commented on YARN-2579:
--

This scenario could ocure if 2 thread trying to access 
ResourceManager#transitionToStandby().One is from 
AdminService#trainsitiontostandby first and then 
RMFatalEventDispatcher#transitionToStandBy(). This I simulated using debug 
point.
The main problem is in resetting dispatcher, stops the dispatcher. Suppose, if 
AdminService is stopping dispatcher but dispatcher thread is blocked for 
getting acquire lock on ResourceManager, then ResourceManager never get 
transitioned to StandBy. It wait infinitely.

{code}
AsyncDispatcher event handler daemon prio=10 tid=0x007ea000 
nid=0x39d1 waiting for monitor entry [0x7fe0a77f6000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToStandby(ResourceManager.java:976)
- waiting to lock 0xc1f7d438 (a 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMFatalEventDispatcher.handle(ResourceManager.java:701)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMFatalEventDispatcher.handle(ResourceManager.java:678)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:745)
IPC Server handler 0 on 45021 daemon prio=10 tid=0x7fe0a9026800 
nid=0x30ab in Object.wait() [0x7fe0a7cfa000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 0xeb3310e8 (a java.lang.Thread)
at java.lang.Thread.join(Thread.java:1281)
- locked 0xeb3310e8 (a java.lang.Thread)
at java.lang.Thread.join(Thread.java:1355)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:150)
at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
- locked 0xeb32fef8 (a java.lang.Object)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.resetDispatcher(ResourceManager.java:1166)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToStandby(ResourceManager.java:987)
- locked 0xc1f7d438 (a 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToStandby(AdminService.java:308)
- locked 0xc2038d10 (a 
org.apache.hadoop.yarn.server.resourcemanager.AdminService)
at 
org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToStandby(HAServiceProtocolServerSideTranslatorPB.java:119)
at 
org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4462)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
{code}


 Both RM's state is Active , but 1 RM is not really active.
 --

 Key: YARN-2579
 URL: https://issues.apache.org/jira/browse/YARN-2579
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Rohith

 I encountered a situaltion where both RM's web page was able to access and 
 its state displayed as Active. But One of the RM's ActiveServices were 
 stopped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2453) TestProportionalCapacityPreemptionPolicy fails with FairScheduler

2014-09-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143235#comment-14143235
 ] 

Hudson commented on YARN-2453:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1879 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1879/])
YARN-2453. TestProportionalCapacityPreemptionPolicy fails with FairScheduler. 
(Zhihai Xu via kasha) (kasha: rev 9721e2c1feb5aecea3a6dab5bda96af1cd0f8de3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java
* hadoop-yarn-project/CHANGES.txt


 TestProportionalCapacityPreemptionPolicy fails with FairScheduler
 -

 Key: YARN-2453
 URL: https://issues.apache.org/jira/browse/YARN-2453
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 2.6.0

 Attachments: YARN-2453.000.patch, YARN-2453.001.patch, 
 YARN-2453.002.patch


 TestProportionalCapacityPreemptionPolicy is failed for FairScheduler.
 The following is error message:
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 Tests run: 18, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.94 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 testPolicyInitializeAfterSchedulerInitialized(org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy)
   Time elapsed: 1.61 sec   FAILURE!
 java.lang.AssertionError: Failed to find SchedulingMonitor service, please 
 check what happened
   at org.junit.Assert.fail(Assert.java:88)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy.testPolicyInitializeAfterSchedulerInitialized(TestProportionalCapacityPreemptionPolicy.java:469)
 This test should only work for capacity scheduler because the following 
 source code in ResourceManager.java prove it will only work for capacity 
 scheduler.
 {code}
 if (scheduler instanceof PreemptableResourceScheduler
conf.getBoolean(YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS,
   YarnConfiguration.DEFAULT_RM_SCHEDULER_ENABLE_MONITORS)) {
 {code}
 Because CapacityScheduler is instance of PreemptableResourceScheduler and 
 FairScheduler is not  instance of PreemptableResourceScheduler.
 I will upload a patch to fix this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2452) TestRMApplicationHistoryWriter fails with FairScheduler

2014-09-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143236#comment-14143236
 ] 

Hudson commented on YARN-2452:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1879 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1879/])
YARN-2452. TestRMApplicationHistoryWriter fails with FairScheduler. (Zhihai Xu 
via kasha) (kasha: rev c50fc92502934aa2a8f84ea2466d4da1e3eace9d)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/TestRMApplicationHistoryWriter.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java
* hadoop-yarn-project/CHANGES.txt


 TestRMApplicationHistoryWriter fails with FairScheduler
 ---

 Key: YARN-2452
 URL: https://issues.apache.org/jira/browse/YARN-2452
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 2.6.0

 Attachments: YARN-2452.000.patch, YARN-2452.001.patch, 
 YARN-2452.002.patch


 TestRMApplicationHistoryWriter is failed for FairScheduler. The failure is 
 the following:
 T E S T S
 ---
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter
 Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 69.311 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter
 testRMWritingMassiveHistory(org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter)
   Time elapsed: 66.261 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:200
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:430)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:391)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2580) Windows Secure Container Executor: grant job query privileges to the container user

2014-09-22 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu reassigned YARN-2580:
--

Assignee: Remus Rusanu

 Windows Secure Container Executor: grant job query privileges to the 
 container user
 ---

 Key: YARN-2580
 URL: https://issues.apache.org/jira/browse/YARN-2580
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Remus Rusanu
Assignee: Remus Rusanu

 mapred.MapTask.iniitalize uses WindowsBasedProcessTree which uses winutils to 
 query the container NT JOB. This must eb granted query permission by the 
 hadoopwinutilsvc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2580) Windows Secure Container Executor: grant job query privileges to the container user

2014-09-22 Thread Remus Rusanu (JIRA)
Remus Rusanu created YARN-2580:
--

 Summary: Windows Secure Container Executor: grant job query 
privileges to the container user
 Key: YARN-2580
 URL: https://issues.apache.org/jira/browse/YARN-2580
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Remus Rusanu


mapred.MapTask.iniitalize uses WindowsBasedProcessTree which uses winutils to 
query the container NT JOB. This must eb granted query permission by the 
hadoopwinutilsvc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-09-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143301#comment-14143301
 ] 

Wangda Tan commented on YARN-1198:
--

Hi [~cwelch],
Sorry for this late response, I've just looked your ver.8 patch and comments,
My reply,
bq. -re we don't need write HeadroomProvider for each scheduler 
And 
bq. Provider vs Reference
I agree with this, I think we need write different Headroom Provider and it's 
better to keep Provider since its more general.

bq. -re As mentioned by Jason, currently ...
Agree, this can be done in a separated JIRA

bq. -re the cost of the calculation
Agree, it's just a small computation effort.

In the past, I suggest do as I mentioned 
https://issues.apache.org/jira/browse/YARN-1198?focusedCommentId=14108991page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14108991
 because I think that will make code more clean.
But according to your ver.8 patch, I realized that may not doable. In 
LeafQueue#computeUserLimit, it uses required to get user limit. In your patch, 
you save the lastRequired to user class. However, we need different required 
for different app under a same user. We can only do the calculate when app 
heartbeats (We can also loop and set all app's headroom, but that's a way we 
abandoned before). 

So basically, IMHO, I think your ver.7 is a more correct way to go. Which keeps 
complexity/efficiency balanced. 
Any thoughts? [~jianhe], [~cwelch].

Wangda

 Capacity Scheduler headroom calculation does not work as expected
 -

 Key: YARN-1198
 URL: https://issues.apache.org/jira/browse/YARN-1198
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Craig Welch
 Attachments: YARN-1198.1.patch, YARN-1198.2.patch, YARN-1198.3.patch, 
 YARN-1198.4.patch, YARN-1198.5.patch, YARN-1198.6.patch, YARN-1198.7.patch, 
 YARN-1198.8.patch


 Today headroom calculation (for the app) takes place only when
 * New node is added/removed from the cluster
 * New container is getting assigned to the application.
 However there are potentially lot of situations which are not considered for 
 this calculation
 * If a container finishes then headroom for that application will change and 
 should be notified to the AM accordingly.
 * If a single user has submitted multiple applications (app1 and app2) to the 
 same queue then
 ** If app1's container finishes then not only app1's but also app2's AM 
 should be notified about the change in headroom.
 ** Similarly if a container is assigned to any applications app1/app2 then 
 both AM should be notified about their headroom.
 ** To simplify the whole communication process it is ideal to keep headroom 
 per User per LeafQueue so that everyone gets the same picture (apps belonging 
 to same user and submitted in same queue).
 * If a new user submits an application to the queue then all applications 
 submitted by all users in that queue should be notified of the headroom 
 change.
 * Also today headroom is an absolute number ( I think it should be normalized 
 but then this is going to be not backward compatible..)
 * Also  when admin user refreshes queue headroom has to be updated.
 These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2453) TestProportionalCapacityPreemptionPolicy fails with FairScheduler

2014-09-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143386#comment-14143386
 ] 

Hudson commented on YARN-2453:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1904 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1904/])
YARN-2453. TestProportionalCapacityPreemptionPolicy fails with FairScheduler. 
(Zhihai Xu via kasha) (kasha: rev 9721e2c1feb5aecea3a6dab5bda96af1cd0f8de3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java
* hadoop-yarn-project/CHANGES.txt


 TestProportionalCapacityPreemptionPolicy fails with FairScheduler
 -

 Key: YARN-2453
 URL: https://issues.apache.org/jira/browse/YARN-2453
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 2.6.0

 Attachments: YARN-2453.000.patch, YARN-2453.001.patch, 
 YARN-2453.002.patch


 TestProportionalCapacityPreemptionPolicy is failed for FairScheduler.
 The following is error message:
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 Tests run: 18, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.94 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 testPolicyInitializeAfterSchedulerInitialized(org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy)
   Time elapsed: 1.61 sec   FAILURE!
 java.lang.AssertionError: Failed to find SchedulingMonitor service, please 
 check what happened
   at org.junit.Assert.fail(Assert.java:88)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy.testPolicyInitializeAfterSchedulerInitialized(TestProportionalCapacityPreemptionPolicy.java:469)
 This test should only work for capacity scheduler because the following 
 source code in ResourceManager.java prove it will only work for capacity 
 scheduler.
 {code}
 if (scheduler instanceof PreemptableResourceScheduler
conf.getBoolean(YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS,
   YarnConfiguration.DEFAULT_RM_SCHEDULER_ENABLE_MONITORS)) {
 {code}
 Because CapacityScheduler is instance of PreemptableResourceScheduler and 
 FairScheduler is not  instance of PreemptableResourceScheduler.
 I will upload a patch to fix this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2452) TestRMApplicationHistoryWriter fails with FairScheduler

2014-09-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143387#comment-14143387
 ] 

Hudson commented on YARN-2452:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1904 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1904/])
YARN-2452. TestRMApplicationHistoryWriter fails with FairScheduler. (Zhihai Xu 
via kasha) (kasha: rev c50fc92502934aa2a8f84ea2466d4da1e3eace9d)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/TestRMApplicationHistoryWriter.java


 TestRMApplicationHistoryWriter fails with FairScheduler
 ---

 Key: YARN-2452
 URL: https://issues.apache.org/jira/browse/YARN-2452
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 2.6.0

 Attachments: YARN-2452.000.patch, YARN-2452.001.patch, 
 YARN-2452.002.patch


 TestRMApplicationHistoryWriter is failed for FairScheduler. The failure is 
 the following:
 T E S T S
 ---
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter
 Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 69.311 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter
 testRMWritingMassiveHistory(org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter)
   Time elapsed: 66.261 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:200
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:430)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:391)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1530) [Umbrella] Store, manage and serve per-framework application-timeline data

2014-09-22 Thread bc Wong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143401#comment-14143401
 ] 

bc Wong commented on YARN-1530:
---

Hi [~zjshen]. First, glad to see that we're discussing approaches. You seem to 
agree with the premise that *ATS write path should not slow down apps*.

bq. Therefore, is making the timeline server reliable (or always-up) the 
essential solution? If the timeline server is reliable, ...

In theory, you can make the ATS *always-up*. In practice, we both know what 
real life distributed systems do. Always-up isn't the only thing. The write 
path needs to have good uptime and latency regardless of what's happening to 
the read path or the backing store.

HDFS is a good default for the write channel because:
* We don't have to design an ATS that is always-up. If you really want to, I'm 
sure you can eventually build something with good uptime. But it took other 
projects (HDFS, ZK) lots of hard work to get to that point.
* If we reuse HDFS, cluster admins know how to operate HDFS and get good uptime 
from it. But it'll take training and hard-learned lessons for operators to 
figure out how to get good uptime from ATS, even after you build an always-up 
ATS.
* All the popular YARN app frameworks (MR, Spark, etc.) already rely on HDFS by 
default. So do most of the 3rd party applications that I know of. 
Architecturally, it seems easier for admins to accept that ATS write path 
depends on HDFS for reliability, instead of a new component that (we claim) 
will be as reliable as HDFS/ZK.

bq. given the whole roadmap of the timeline service, let's think critically of 
work that can improve the timeline service most significantly.

Exactly. Strong +1. If we can drop the high uptime + low write latency 
requirement from the ATS service, we can avoid tons of effort. ATS doesn't need 
to be as reliable as HDFS. We don't need to worry about insulating the write 
path from the read path. We don't need to worry about occasional hiccups in 
HBase (or whatever the store is). And at the end of all this, everybody sleeps 
better because ATS service going down isn't a catastrophic failure.

 [Umbrella] Store, manage and serve per-framework application-timeline data
 --

 Key: YARN-1530
 URL: https://issues.apache.org/jira/browse/YARN-1530
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
 Attachments: ATS-Write-Pipeline-Design-Proposal.pdf, 
 ATS-meet-up-8-28-2014-notes.pdf, application timeline design-20140108.pdf, 
 application timeline design-20140116.pdf, application timeline 
 design-20140130.pdf, application timeline design-20140210.pdf


 This is a sibling JIRA for YARN-321.
 Today, each application/framework has to do store, and serve per-framework 
 data all by itself as YARN doesn't have a common solution. This JIRA attempts 
 to solve the storage, management and serving of per-framework data from 
 various applications, both running and finished. The aim is to change YARN to 
 collect and store data in a generic manner with plugin points for frameworks 
 to do their own thing w.r.t interpretation and serving.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2581) NMs need to find a way to get LogAggregationContext

2014-09-22 Thread Xuan Gong (JIRA)
Xuan Gong created YARN-2581:
---

 Summary: NMs need to find a way to get LogAggregationContext
 Key: YARN-2581
 URL: https://issues.apache.org/jira/browse/YARN-2581
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong


After YARN-2569, we have LogAggregationContext for application in 
ApplicationSubmissionContext. NMs need to find a way to get this information.
We have this requirement: For all containers in the same application should 
honor the same LogAggregationContext.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-09-22 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143460#comment-14143460
 ] 

Allen Wittenauer commented on YARN-913:
---

bq. Summary: need to fix ZK client and then have curator configure it, so the 
rest of us don't have to care.

This might be a blocker then.  If a client needs to talk to more than one ZK, 
it sounds like they are basically screwed.

bq. do you mean in the endpoint fields? It should ... let me clarify that in 
the example.

I was mainly looking at the hostname pattern:
{code}
+  String HOSTNAME_PATTERN =
+  ([a-z0-9]|[a-z0-9][a-z0-9\\-]*[a-z0-9]);
{code}

It doesn't appear to support periods/dots.

 Add a way to register long-lived services in a YARN cluster
 ---

 Key: YARN-913
 URL: https://issues.apache.org/jira/browse/YARN-913
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Affects Versions: 2.5.0, 2.4.1
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
 YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
 YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
 YARN-913-007.patch, YARN-913-008.patch, yarnregistry.pdf, yarnregistry.tla


 In a YARN cluster you can't predict where services will come up -or on what 
 ports. The services need to work those things out as they come up and then 
 publish them somewhere.
 Applications need to be able to find the service instance they are to bond to 
 -and not any others in the cluster.
 Some kind of service registry -in the RM, in ZK, could do this. If the RM 
 held the write access to the ZK nodes, it would be more secure than having 
 apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2582) Log related CLI and Web UI changes for LRS

2014-09-22 Thread Xuan Gong (JIRA)
Xuan Gong created YARN-2582:
---

 Summary: Log related CLI and Web UI changes for LRS
 Key: YARN-2582
 URL: https://issues.apache.org/jira/browse/YARN-2582
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong


After YARN-2468, we have change the log layout to support log aggregation for 
Long Running Service. Log CLI and related Web UI should be modified 
accordingly.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1372) Ensure all completed containers are reported to the AMs across RM restart

2014-09-22 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143473#comment-14143473
 ] 

Jian He commented on YARN-1372:
---

+1 for the latest patch,  committing

 Ensure all completed containers are reported to the AMs across RM restart
 -

 Key: YARN-1372
 URL: https://issues.apache.org/jira/browse/YARN-1372
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1372.001.patch, YARN-1372.001.patch, 
 YARN-1372.002_NMHandlesCompletedApp.patch, 
 YARN-1372.002_RMHandlesCompletedApp.patch, 
 YARN-1372.002_RMHandlesCompletedApp.patch, YARN-1372.003.patch, 
 YARN-1372.004.patch, YARN-1372.005.patch, YARN-1372.005.patch, 
 YARN-1372.006.patch, YARN-1372.007.patch, YARN-1372.008.patch, 
 YARN-1372.009.patch, YARN-1372.009.patch, YARN-1372.010.patch, 
 YARN-1372.prelim.patch, YARN-1372.prelim2.patch


 Currently the NM informs the RM about completed containers and then removes 
 those containers from the RM notification list. The RM passes on that 
 completed container information to the AM and the AM pulls this data. If the 
 RM dies before the AM pulls this data then the AM may not be able to get this 
 information again. To fix this, NM should maintain a separate list of such 
 completed container notifications sent to the RM. After the AM has pulled the 
 containers from the RM then the RM will inform the NM about it and the NM can 
 remove the completed container from the new list. Upon re-register with the 
 RM (after RM restart) the NM should send the entire list of completed 
 containers to the RM along with any other containers that completed while the 
 RM was dead. This ensures that the RM can inform the AM's about all completed 
 containers. Some container completions may be reported more than once since 
 the AM may have pulled the container but the RM may die before notifying the 
 NM about the pull.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS

2014-09-22 Thread Xuan Gong (JIRA)
Xuan Gong created YARN-2583:
---

 Summary: Modify the LogDeletionService to support Log aggregation 
for LRS
 Key: YARN-2583
 URL: https://issues.apache.org/jira/browse/YARN-2583
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong


Currently, AggregatedLogDeletionService will delete old logs from HDFS. It will 
directly delete the app-log-dir from HDFS. This will not work for LRS. We 
expect a LRS application can keep running for a long time. Deleting the 
app-log-dir for the LRS applications is not a right way to handle it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS

2014-09-22 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-2583:
---

Assignee: Xuan Gong

 Modify the LogDeletionService to support Log aggregation for LRS
 

 Key: YARN-2583
 URL: https://issues.apache.org/jira/browse/YARN-2583
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong

 Currently, AggregatedLogDeletionService will delete old logs from HDFS. It 
 will directly delete the app-log-dir from HDFS. This will not work for LRS. 
 We expect a LRS application can keep running for a long time. Deleting the 
 app-log-dir for the LRS applications is not a right way to handle it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2554) Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy

2014-09-22 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143495#comment-14143495
 ] 

Steve Loughran commented on YARN-2554:
--

Vinod, this patch is independent of kerberos, secure AMs, etc.

This patch so to an any AM to export an HTTPS URL; you can't do this on a 
secure or insecure cluster today.

It doesn't mean that clients can trust something just because it is on HTTPS; 
that's an independent issue. 

 Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy
 -

 Key: YARN-2554
 URL: https://issues.apache.org/jira/browse/YARN-2554
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.6.0
Reporter: Jonathan Maron
 Attachments: YARN-2554.1.patch, YARN-2554.2.patch, YARN-2554.3.patch, 
 YARN-2554.3.patch


 If the HTTP policy to enable HTTPS is specified, the RM and AM are 
 initialized with SSL listeners.  The RM has a web app proxy servlet that acts 
 as a proxy for incoming AM requests.  In order to forward the requests to the 
 AM the proxy servlet makes use of HttpClient.  However, the HttpClient 
 utilized is not initialized correctly with the necessary certs to allow for 
 successful one way SSL invocations to the other nodes in the cluster (it is 
 not configured to access/load the client truststore specified in 
 ssl-client.xml).   I imagine SSLFactory.createSSLSocketFactory() could be 
 utilized to create an instance that can be assigned to the HttpClient.
 The symptoms of this issue are:
 AM: Displays unknown_certificate exception
 RM:  Displays an exception such as javax.net.ssl.SSLHandshakeException: 
 sun.security.validator.ValidatorException: PKIX path building failed: 
 sun.security.provider.certpath.SunCertPathBuilderException: unable to find 
 valid certification path to requested target



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2468) Log handling for LRS

2014-09-22 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143499#comment-14143499
 ] 

Xuan Gong commented on YARN-2468:
-

This is a very big patch and is hard for review. I would like to decouple the 
big patch to several smaller patches:
1) api changes will be tracked by YARN-2569
2) NMs need to find a way to get LogAggregationContext. This will be tracked by 
YARN-2581
3) Current ticket will be used to track changes for NM handling the logs for 
LRS which will include the log layout changes
4) Log Deletion Service changes will be tracked by YARN-2583
5) Related CL and web UI changes will be tracked by YARN-2582

 Log handling for LRS
 

 Key: YARN-2468
 URL: https://issues.apache.org/jira/browse/YARN-2468
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2468.1.patch, YARN-2468.2.patch, YARN-2468.3.patch, 
 YARN-2468.3.rebase.2.patch, YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, 
 YARN-2468.4.patch, YARN-2468.5.1.patch, YARN-2468.5.1.patch, 
 YARN-2468.5.2.patch, YARN-2468.5.3.patch, YARN-2468.5.4.patch, 
 YARN-2468.5.patch, YARN-2468.6.1.patch, YARN-2468.6.patch


 Currently, when application is finished, NM will start to do the log 
 aggregation. But for Long running service applications, this is not ideal. 
 The problems we have are:
 1) LRS applications are expected to run for a long time (weeks, months).
 2) Currently, all the container logs (from one NM) will be written into a 
 single file. The files could become larger and larger.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2562) ContainerId@toString() is unreadable for epoch 0 after YARN-2182

2014-09-22 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143567#comment-14143567
 ] 

Jian He commented on YARN-2562:
---

patch looks good. thanks Tsuyoshi ! could you add brief comment in the toString 
method that the epoch will increase if RM restarts or fails over?

 ContainerId@toString() is unreadable for epoch 0 after YARN-2182
 -

 Key: YARN-2562
 URL: https://issues.apache.org/jira/browse/YARN-2562
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Tsuyoshi OZAWA
Priority: Critical
 Attachments: YARN-2562.1.patch


 ContainerID string format is unreadable for RMs that restarted at least once 
 (epoch  0) after YARN-2182. For e.g, 
 container_1410901177871_0001_01_05_17.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1530) [Umbrella] Store, manage and serve per-framework application-timeline data

2014-09-22 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143594#comment-14143594
 ] 

Zhijie Shen commented on YARN-1530:
---

Hi, [~bcwalrus]. Thanks for your further comments.

bq. You seem to agree with the premise that ATS write path should not slow down 
apps.

Definitely. The arguable point is that the current timeline client is going to 
slow down the app, given we have a scalable and reliable timeline server.

bq. If we can drop the high uptime + low write latency requirement from the ATS 
service, we can avoid tons of effort.

I'm not sure such fundamental requirements can be dropped from the timeline 
service. Projecting the future, scalable and high available timeline servers 
have multiple benefits and enable different use cases. For example,

1. We can use it to serve realtime or near realtime data, such that we can go 
the timeline server to see what happens to an application. It's in particularly 
useful for the long running services, which will never turn down.

2. We can build checkpoints on the timeline server for the app do to recovery 
once it crashes. It's pretty much like what we've done for MR jobs.

I bundled scalable and reliable together because multiple-instance solution 
will improve the timeline server in both dimensions.

Moreover, no matter how scalable and reliable the channel could be, we 
eventually want to get the timeline data accommodated into the timeline server, 
right? Otherwise, it is not going to be accessible by users (Of course, tricks 
can be played to fetch it directly from HDFS, but it's completely another story 
than the timeline server). If the apps are publishing 10GB data per hour, while 
the server can only process 1G per hour, the 9GB outstanding data per hour that 
resides in some temp location of HDFS is going to be useless writes.

We have narrow down very much to discuss the reliability of the write path, but 
if we look into the big picture, *the timeline server is not just place to 
store data, but also serves it to users* (e.g., YARN-2513). In terms of use 
case, users may want to monitor completed apps as well as running apps and 
cluster. If the timeline server doesn't have capacity to serve the data for a 
particular use case, it's actually wasting the cost on aggregating it. IMHO, 
the scalable and the reliable timeline server is going to be *the eventual 
solution to satisfy multiple stakeholders*, regardless the use case is read 
intensive, write intensive or both intensive. That's why I think it could a 
high margin work to improve the timeline server. It's may be a hard work, but 
we should definitely pick it up.


 [Umbrella] Store, manage and serve per-framework application-timeline data
 --

 Key: YARN-1530
 URL: https://issues.apache.org/jira/browse/YARN-1530
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
 Attachments: ATS-Write-Pipeline-Design-Proposal.pdf, 
 ATS-meet-up-8-28-2014-notes.pdf, application timeline design-20140108.pdf, 
 application timeline design-20140116.pdf, application timeline 
 design-20140130.pdf, application timeline design-20140210.pdf


 This is a sibling JIRA for YARN-321.
 Today, each application/framework has to do store, and serve per-framework 
 data all by itself as YARN doesn't have a common solution. This JIRA attempts 
 to solve the storage, management and serving of per-framework data from 
 various applications, both running and finished. The aim is to change YARN to 
 collect and store data in a generic manner with plugin points for frameworks 
 to do their own thing w.r.t interpretation and serving.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS

2014-09-22 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2583:

Description: 
Currently, AggregatedLogDeletionService will delete old logs from HDFS. It will 
check the cut-off-time, if all logs for this application is older than this 
cut-off-time. The app-log-dir from HDFS will be deleted. This will not work for 
LRS. We expect a LRS application can keep running for a long time. 
Two different scenarios: 
1) If we configured the rollingIntervalSeconds, the new log file will be always 
uploaded to HDFS. The number of log files for this application will become 
larger and larger. And there is no log files will be deleted.
2) If we did not configure the rollingIntervalSeconds, the log file can only be 
uploaded to HDFS after the application is finished. It is very possible that 
the logs are uploaded after the cut-off-time. It will cause problem because at 
that time the app-log-dir for this application in HDFS has been deleted.

  was:Currently, AggregatedLogDeletionService will delete old logs from HDFS. 
It will directly delete the app-log-dir from HDFS. This will not work for LRS. 
We expect a LRS application can keep running for a long time. Deleting the 
app-log-dir for the LRS applications is not a right way to handle it.


 Modify the LogDeletionService to support Log aggregation for LRS
 

 Key: YARN-2583
 URL: https://issues.apache.org/jira/browse/YARN-2583
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong

 Currently, AggregatedLogDeletionService will delete old logs from HDFS. It 
 will check the cut-off-time, if all logs for this application is older than 
 this cut-off-time. The app-log-dir from HDFS will be deleted. This will not 
 work for LRS. We expect a LRS application can keep running for a long time. 
 Two different scenarios: 
 1) If we configured the rollingIntervalSeconds, the new log file will be 
 always uploaded to HDFS. The number of log files for this application will 
 become larger and larger. And there is no log files will be deleted.
 2) If we did not configure the rollingIntervalSeconds, the log file can only 
 be uploaded to HDFS after the application is finished. It is very possible 
 that the logs are uploaded after the cut-off-time. It will cause problem 
 because at that time the app-log-dir for this application in HDFS has been 
 deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2494) [YARN-796] Node label manager API and storage implementations

2014-09-22 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143665#comment-14143665
 ] 

Craig Welch commented on YARN-2494:
---

The other day [~vinodkv] suggested changing the addLabel removeLabel ... api's 
to addNodeLabel removeNodeLabel... to make it more clear (and presumably make 
adding other possible types of labels in the future more smooth).  This would 
not effect the label apis, the node-to-label ones are ok already, I think.  
Thoughts?  

 [YARN-796] Node label manager API and storage implementations
 -

 Key: YARN-2494
 URL: https://issues.apache.org/jira/browse/YARN-2494
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2494.patch, YARN-2494.patch, YARN-2494.patch, 
 YARN-2494.patch


 This JIRA includes APIs and storage implementations of node label manager,
 NodeLabelManager is an abstract class used to manage labels of nodes in the 
 cluster, it has APIs to query/modify
 - Nodes according to given label
 - Labels according to given hostname
 - Add/remove labels
 - Set labels of nodes in the cluster
 - Persist/recover changes of labels/labels-on-nodes to/from storage
 And it has two implementations to store modifications
 - Memory based storage: It will not persist changes, so all labels will be 
 lost when RM restart
 - FileSystem based storage: It will persist/recover to/from FileSystem (like 
 HDFS), and all labels and labels-on-nodes will be recovered upon RM restart



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2312) Marking ContainerId#getId as deprecated

2014-09-22 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143699#comment-14143699
 ] 

Jason Lowe commented on YARN-2312:
--

bq. One idea is to add id for upper 32 bits of container Id to ID class.

The ID class is used by much more than just JvmID objects.  I'm not a fan of 
making all IDs pay for this extra storage when we only need it for this one 
case.  I'd rather store the extra bits in JvmID.

Actually I don't think it's critical that JvmID derives from ID.  We could have 
JvmID store the long itself rather than try to hack an extra 4-bytes onto ID 
and then need to explain why JvmID.getId doesn't do what one would expect.

 Marking ContainerId#getId as deprecated
 ---

 Key: YARN-2312
 URL: https://issues.apache.org/jira/browse/YARN-2312
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2312-wip.patch


 {{ContainerId#getId}} will only return partial value of containerId, only 
 sequence number of container id without epoch, after YARN-2229. We should 
 mark {{ContainerId#getId}} as deprecated and use 
 {{ContainerId#getContainerId}} instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2320) Removing old application history store after we store the history data to timeline store

2014-09-22 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2320:
--
Attachment: YARN-2320.1.patch

 Removing old application history store after we store the history data to 
 timeline store
 

 Key: YARN-2320
 URL: https://issues.apache.org/jira/browse/YARN-2320
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2320.1.patch


 After YARN-2033, we should deprecate application history store set. There's 
 no need to maintain two sets of store interfaces. In addition, we should 
 conclude the outstanding jira's under YARN-321 about the application history 
 store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2320) Removing old application history store after we store the history data to timeline store

2014-09-22 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143783#comment-14143783
 ] 

Zhijie Shen commented on YARN-2320:
---

Upload a huge patch, but it doesn't have complex logic, but removing the old 
application history store stack, including:

1. Null|Memory|FileSystemApplicationHistoryStore, the related protobuf classes, 
and ApplicationHistoryManagerImpl based on it.
2. RMApplicationHistoryWriter, the events used by it, and the invokes in the 
scope of RM.
3. Unnecessary configurations in YarnConfiguration.

I addition, I've fixed the test cases based on ApplicationHistoryStore, and 
rename ApplicationHistoryManagerOnTimelineStore to 
ApplicationHistoryManagerImpl.

 Removing old application history store after we store the history data to 
 timeline store
 

 Key: YARN-2320
 URL: https://issues.apache.org/jira/browse/YARN-2320
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2320.1.patch


 After YARN-2033, we should deprecate application history store set. There's 
 no need to maintain two sets of store interfaces. In addition, we should 
 conclude the outstanding jira's under YARN-321 about the application history 
 store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2540) Fair Scheduler : queue filters not working on scheduler page in RM UI

2014-09-22 Thread Ashwin Shankar (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143793#comment-14143793
 ] 

Ashwin Shankar commented on YARN-2540:
--

Hi [~kasha], when you get a chance can you please review/commit the latest 
patch ?

 Fair Scheduler : queue filters not working on scheduler page in RM UI
 -

 Key: YARN-2540
 URL: https://issues.apache.org/jira/browse/YARN-2540
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.5.0, 2.5.1
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
 Attachments: YARN-2540-v1.txt, YARN-2540-v2.txt, YARN-2540-v3.txt


 Steps to reproduce :
 1. Run an app in default queue.
 2. While the app is running, go to the scheduler page on RM UI.
 3. You would see the app in the apptable at the bottom.
 4. Now click on default queue to filter the apptable on root.default.
 5. App disappears from apptable although it is running on default queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2320) Removing old application history store after we store the history data to timeline store

2014-09-22 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2320:
--
Attachment: YARN-2320.2.patch

Remove the unnecessary proto file as well

 Removing old application history store after we store the history data to 
 timeline store
 

 Key: YARN-2320
 URL: https://issues.apache.org/jira/browse/YARN-2320
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2320.1.patch, YARN-2320.2.patch


 After YARN-2033, we should deprecate application history store set. There's 
 no need to maintain two sets of store interfaces. In addition, we should 
 conclude the outstanding jira's under YARN-321 about the application history 
 store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2505) [YARN-796] Support get/add/remove/change labels in RM REST API

2014-09-22 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143855#comment-14143855
 ] 

Craig Welch commented on YARN-2505:
---

1) 
-re rename all-nodes-to-lables to nodes-to-labels - done

-re node-filter, I don't think that it makes sense to switch it.  While 
code-wise I see where it is awkward to do a value filter, this follows the spec 
and it makes sense from a use case perspective - I expect that the desire is to 
find all of the nodes which have a particular label on them, that is the 
purpose of this filter and it makes sense to me that someone would want to do 
that and it seems to fit in with this api.  I think there are easier ways to 
see what labels are on a node, adding it as a filter to this kind of an api 
call makes little sense to me anyway as it is more-or-less a direct property of 
a node - if it's missing I think it belongs elsewhere else anyway.

Have shortened lines where found



 [YARN-796] Support get/add/remove/change labels in RM REST API
 --

 Key: YARN-2505
 URL: https://issues.apache.org/jira/browse/YARN-2505
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Craig Welch
 Attachments: YARN-2505.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2505) [YARN-796] Support get/add/remove/change labels in RM REST API

2014-09-22 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-2505:
--
Attachment: YARN-2505.1.patch

 [YARN-796] Support get/add/remove/change labels in RM REST API
 --

 Key: YARN-2505
 URL: https://issues.apache.org/jira/browse/YARN-2505
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Craig Welch
 Attachments: YARN-2505.1.patch, YARN-2505.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2578) NM does not failover timely if RM node network connection fails

2014-09-22 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143890#comment-14143890
 ] 

Karthik Kambatla commented on YARN-2578:


Would it be possible to add a test case for this?

 NM does not failover timely if RM node network connection fails
 ---

 Key: YARN-2578
 URL: https://issues.apache.org/jira/browse/YARN-2578
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.1
Reporter: Wilfred Spiegelenburg
 Attachments: YARN-2578.patch


 The NM does not fail over correctly when the network cable of the RM is 
 unplugged or the failure is simulated by a service network stop or a 
 firewall that drops all traffic on the node. The RM fails over to the standby 
 node when the failure is detected as expected. The NM should than re-register 
 with the new active RM. This re-register takes a long time (15 minutes or 
 more). Until then the cluster has no nodes for processing and applications 
 are stuck.
 Reproduction test case which can be used in any environment:
 - create a cluster with 3 nodes
 node 1: ZK, NN, JN, ZKFC, DN, RM, NM
 node 2: ZK, NN, JN, ZKFC, DN, RM, NM
 node 3: ZK, JN, DN, NM
 - start all services make sure they are in good health
 - kill the network connection of the RM that is active using one of the 
 network kills from above
 - observe the NN and RM failover
 - the DN's fail over to the new active NN
 - the NM does not recover for a long time
 - the logs show a long delay and traces show no change at all
 The stack traces of the NM all show the same set of threads. The main thread 
 which should be used in the re-register is the Node Status Updater This 
 thread is stuck in:
 {code}
 Node Status Updater prio=10 tid=0x7f5a6cc99800 nid=0x18d0 in 
 Object.wait() [0x7f5a51fc1000]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0xed62f488 (a org.apache.hadoop.ipc.Client$Call)
   at java.lang.Object.wait(Object.java:503)
   at org.apache.hadoop.ipc.Client.call(Client.java:1395)
   - locked 0xed62f488 (a org.apache.hadoop.ipc.Client$Call)
   at org.apache.hadoop.ipc.Client.call(Client.java:1362)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
   at com.sun.proxy.$Proxy26.nodeHeartbeat(Unknown Source)
   at 
 org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.nodeHeartbeat(ResourceTrackerPBClientImpl.java:80)
 {code}
 The client connection which goes through the proxy can be traced back to the 
 ResourceTrackerPBClientImpl. The generated proxy does not time out and we 
 should be using a version which takes the RPC timeout (from the 
 configuration) as a parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2540) FairScheduler: Queue filters not working on scheduler page in RM UI

2014-09-22 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2540:
---
Summary: FairScheduler: Queue filters not working on scheduler page in RM 
UI  (was: Fair Scheduler : queue filters not working on scheduler page in RM UI)

 FairScheduler: Queue filters not working on scheduler page in RM UI
 ---

 Key: YARN-2540
 URL: https://issues.apache.org/jira/browse/YARN-2540
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.5.0, 2.5.1
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
 Attachments: YARN-2540-v1.txt, YARN-2540-v2.txt, YARN-2540-v3.txt


 Steps to reproduce :
 1. Run an app in default queue.
 2. While the app is running, go to the scheduler page on RM UI.
 3. You would see the app in the apptable at the bottom.
 4. Now click on default queue to filter the apptable on root.default.
 5. App disappears from apptable although it is running on default queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2569) Log Handling for LRS API Changes

2014-09-22 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2569:

Attachment: YARN-2569.4.patch

fix all the latest comments

 Log Handling for LRS API Changes
 

 Key: YARN-2569
 URL: https://issues.apache.org/jira/browse/YARN-2569
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2569.1.patch, YARN-2569.2.patch, YARN-2569.3.patch, 
 YARN-2569.4.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2578) NM does not failover timely if RM node network connection fails

2014-09-22 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143951#comment-14143951
 ] 

Vinod Kumar Vavilapalli commented on YARN-2578:
---

bq. The NM does not fail over correctly when the network cable of the RM is 
unplugged or the failure is simulated by a service network stop or a firewall 
that drops all traffic on the node. The RM fails over to the standby node when 
the failure is detected as expected.
I am surprised that RM itself fails over (in the context of firewall rule that 
drops traffic) - we never implemented health monitoring like in ZKFC with HDFS. 
It seems like if the rpc port gets blocked the RM will not failover as the 
embedded ZK continues to use the local loop-back and so doesn't detect the 
network failure.

 NM does not failover timely if RM node network connection fails
 ---

 Key: YARN-2578
 URL: https://issues.apache.org/jira/browse/YARN-2578
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.1
Reporter: Wilfred Spiegelenburg
 Attachments: YARN-2578.patch


 The NM does not fail over correctly when the network cable of the RM is 
 unplugged or the failure is simulated by a service network stop or a 
 firewall that drops all traffic on the node. The RM fails over to the standby 
 node when the failure is detected as expected. The NM should than re-register 
 with the new active RM. This re-register takes a long time (15 minutes or 
 more). Until then the cluster has no nodes for processing and applications 
 are stuck.
 Reproduction test case which can be used in any environment:
 - create a cluster with 3 nodes
 node 1: ZK, NN, JN, ZKFC, DN, RM, NM
 node 2: ZK, NN, JN, ZKFC, DN, RM, NM
 node 3: ZK, JN, DN, NM
 - start all services make sure they are in good health
 - kill the network connection of the RM that is active using one of the 
 network kills from above
 - observe the NN and RM failover
 - the DN's fail over to the new active NN
 - the NM does not recover for a long time
 - the logs show a long delay and traces show no change at all
 The stack traces of the NM all show the same set of threads. The main thread 
 which should be used in the re-register is the Node Status Updater This 
 thread is stuck in:
 {code}
 Node Status Updater prio=10 tid=0x7f5a6cc99800 nid=0x18d0 in 
 Object.wait() [0x7f5a51fc1000]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0xed62f488 (a org.apache.hadoop.ipc.Client$Call)
   at java.lang.Object.wait(Object.java:503)
   at org.apache.hadoop.ipc.Client.call(Client.java:1395)
   - locked 0xed62f488 (a org.apache.hadoop.ipc.Client$Call)
   at org.apache.hadoop.ipc.Client.call(Client.java:1362)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
   at com.sun.proxy.$Proxy26.nodeHeartbeat(Unknown Source)
   at 
 org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.nodeHeartbeat(ResourceTrackerPBClientImpl.java:80)
 {code}
 The client connection which goes through the proxy can be traced back to the 
 ResourceTrackerPBClientImpl. The generated proxy does not time out and we 
 should be using a version which takes the RPC timeout (from the 
 configuration) as a parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1959) Fix headroom calculation in Fair Scheduler

2014-09-22 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-1959:

Attachment: YARN-1959.001.patch

Addressed feedback

 Fix headroom calculation in Fair Scheduler
 --

 Key: YARN-1959
 URL: https://issues.apache.org/jira/browse/YARN-1959
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Sandy Ryza
Assignee: Anubhav Dhoot
 Attachments: YARN-1959.001.patch, YARN-1959.prelim.patch


 The Fair Scheduler currently always sets the headroom to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2562) ContainerId@toString() is unreadable for epoch 0 after YARN-2182

2014-09-22 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143964#comment-14143964
 ] 

Vinod Kumar Vavilapalli commented on YARN-2562:
---

How about {{container_e17_1410901177871_0001_01_05}}? A number at the end 
for me always pointed to the container-id. We also don't need to be verbose 
with epoch. And we can still parse it in a backwards compatible fashion.

If nothing, my fourth preference is to have something like 
{{container_1410901177871_0001_01_05_e17}}, the first three preferences are 
what I proposed above :P

 ContainerId@toString() is unreadable for epoch 0 after YARN-2182
 -

 Key: YARN-2562
 URL: https://issues.apache.org/jira/browse/YARN-2562
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Tsuyoshi OZAWA
Priority: Critical
 Attachments: YARN-2562.1.patch


 ContainerID string format is unreadable for RMs that restarted at least once 
 (epoch  0) after YARN-2182. For e.g, 
 container_1410901177871_0001_01_05_17.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2540) FairScheduler: Queue filters not working on scheduler page in RM UI

2014-09-22 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143967#comment-14143967
 ] 

Karthik Kambatla commented on YARN-2540:


Verified the patch fixes the issue on a pseudo-dist cluster. +1. Committing 
this. 

 FairScheduler: Queue filters not working on scheduler page in RM UI
 ---

 Key: YARN-2540
 URL: https://issues.apache.org/jira/browse/YARN-2540
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.5.0, 2.5.1
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
 Attachments: YARN-2540-v1.txt, YARN-2540-v2.txt, YARN-2540-v3.txt


 Steps to reproduce :
 1. Run an app in default queue.
 2. While the app is running, go to the scheduler page on RM UI.
 3. You would see the app in the apptable at the bottom.
 4. Now click on default queue to filter the apptable on root.default.
 5. App disappears from apptable although it is running on default queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good back again

2014-09-22 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143974#comment-14143974
 ] 

Jason Lowe commented on YARN-90:


Thanks, Varun!  Comments on the latest patch:

It's a bit odd to have a hash map to map disk error types to lists of 
directories, fill them all in, but we only in practice actually look at one 
type in the map and that's DISK_FULL.  It'd be simpler (and faster and less 
space since there's no hashmap involved) to just track full disks as a separate 
collection like we already do for localDirs and failedDirs.

Nit: DISK_ERROR_CAUSE should be DiskErrorCause (if we keep the enum) to match 
the style of other enum types in the code.

In verifyDirUsingMkdir, if an error occurs during the finally clause then that 
exception will mask the original exception

isDiskUsageUnderPercentageLimit is named backwards.  Disk usage being under the 
configured limit shouldn't be a full disk error, and the error message is 
inconsistent with the method name (method talks about being under but error 
message says its above).
{code}
if (isDiskUsageUnderPercentageLimit(testDir)) {
  msg =
  used space above threshold of 
  + diskUtilizationPercentageCutoff
  + %, removing from the list of valid directories.;
{code}

We should only call getDisksHealthReport() once in the following code:
{code}
+String report = getDisksHealthReport();
+if (!report.isEmpty()) {
+  LOG.info(Disk(s) failed.  + getDisksHealthReport());
{code}

Should updateDirsAfterTest always say Disk(s) failed if the report isn't 
empty?  Thinking of the case where two disks go bad, then one later is 
restored.  The health report will still have something, but that last update is 
a disk turning good not failing.  Before this code was only called when a new 
disk failed, and now that's not always the case.  Maybe it should just be 
something like Disk health update:  instead?

Is it really necessary to stat a directory before we try to delete it?  Seems 
like we can just try to delete it.

The idiom of getting the directories and adding the full directories seems 
pretty common.  Might be good to have dirhandler methods that already do this, 
like getLocalDirsForCleanup or getLogDirsForCleanup.

I'm a bit worried that getInitializedLocalDirs could potentially try to delete 
an entire directory tree for a disk.  If this fails in some sector-specific way 
but other containers are currently using their files from other sectors just 
fine on the same disk, removing these files from underneath active containers 
could be very problematic and difficult to debug.

 NodeManager should identify failed disks becoming good back again
 -

 Key: YARN-90
 URL: https://issues.apache.org/jira/browse/YARN-90
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Ravi Gummadi
Assignee: Varun Vasudev
 Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, 
 YARN-90.patch, YARN-90.patch, apache-yarn-90.0.patch, apache-yarn-90.1.patch, 
 apache-yarn-90.2.patch, apache-yarn-90.3.patch, apache-yarn-90.4.patch


 MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes 
 down, it is marked as failed forever. To reuse that disk (after it becomes 
 good), NodeManager needs restart. This JIRA is to improve NodeManager to 
 reuse good disks(which could be bad some time back).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2540) FairScheduler: Queue filters not working on scheduler page in RM UI

2014-09-22 Thread Ashwin Shankar (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143980#comment-14143980
 ] 

Ashwin Shankar commented on YARN-2540:
--

Thanks [~kasha], [~ywskycn] !

 FairScheduler: Queue filters not working on scheduler page in RM UI
 ---

 Key: YARN-2540
 URL: https://issues.apache.org/jira/browse/YARN-2540
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.5.0, 2.5.1
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
 Fix For: 2.6.0

 Attachments: YARN-2540-v1.txt, YARN-2540-v2.txt, YARN-2540-v3.txt


 Steps to reproduce :
 1. Run an app in default queue.
 2. While the app is running, go to the scheduler page on RM UI.
 3. You would see the app in the apptable at the bottom.
 4. Now click on default queue to filter the apptable on root.default.
 5. App disappears from apptable although it is running on default queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels

2014-09-22 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143993#comment-14143993
 ] 

Craig Welch commented on YARN-2496:
---

So, re the headroom issue (2)  - the short version - I don't think we can put 
off addressing this, because I think it is going to be a typical case and will 
be problematic.  I think the most realistic solution is to support only a short 
list of pre-configured label expressions per queue.  Another option is to limit 
nodes to supporting only 1 label per node (which, realistically, might be 
sufficient).  A third option is to limit the number of labels which a queue can 
access to a very small value + the all value (1-2).  Basically, one of the 
factors pushing the large set of possible values which must be considered to 
properly calculate headroom needs to be made finite/drastically reduced.

longer version...
I don't think we should move forward without addressing it.  I say this because 
I think it is likely to be a typical situation to have a queue which has more 
than one label associated with it- most likely, the simple case of a queue 
which can address all nodes some of which have a label and some of which do 
not.  Jobs entering these queues using a restrictive label expression will hit 
this headroom issue - it's especially true in cases where there are lower 
resources, which is what one would expect from a small set of special 
machines (e.g. typical node label case).  It's important to make sure headroom 
is correctly handled as we add node labels, and as things stand, we know it is 
not.

I'm afraid it is something of a design issue, allowing arbitrary node label 
expressions with multiple labels on queues, etc, is leading to something of a 
combinatory explosion.  It may be that the right solution is to narrow the 
feature set a bit for this iteration.  We could choose to only support a 
restricted set of expressions on a given queue.  This could even mean only 
supporting the default label expression - I'm concerned that this may be too 
restrictive - and so that we would need to support a set of expressions.  This 
could then be a finite list which is pre-calculated.  I think, in practical 
terms, this will probably meet people's needs.  A second option is to restrict 
the number of labels supported on a queue, a small enough set could be 
pre-calculated for all possibilities.  I suspicious of this latter option, 
though, it would have to be a very small number of labels to be manageable and 
I think it reduces, realistically, to the restricted set of expressions.  

I also don't see any performant way to support arbitrary nodelable expressions 
on every request with unlimited labels per queue and node - things as they are. 
 It appears to me you would need to keep track of all resource values for 
intersection of all label combinations.  If we limited the number of possible 
labels on a node to one then we could calculate based on expressions at runtime 
(possibly for a very small number  1, but again, growth is exponential? I 
believe... and functionally complex)


 [YARN-796] Changes for capacity scheduler to support allocate resource 
 respect labels
 -

 Key: YARN-2496
 URL: https://issues.apache.org/jira/browse/YARN-2496
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, 
 YARN-2496.patch


 This JIRA Includes:
 - Add/parse labels option to {{capacity-scheduler.xml}} similar to other 
 options of queue like capacity/maximum-capacity, etc.
 - Include a default-label-expression option in queue config, if an app 
 doesn't specify label-expression, default-label-expression of queue will be 
 used.
 - Check if labels can be accessed by the queue when submit an app with 
 labels-expression to queue or update ResourceRequest with label-expression
 - Check labels on NM when trying to allocate ResourceRequest on the NM with 
 label-expression
 - Respect  labels when calculate headroom/user-limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2539) FairScheduler: Update the default value for maxAMShare

2014-09-22 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143997#comment-14143997
 ] 

Karthik Kambatla commented on YARN-2539:


+1

 FairScheduler: Update the default value for maxAMShare
 --

 Key: YARN-2539
 URL: https://issues.apache.org/jira/browse/YARN-2539
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor
 Attachments: YARN-2539-1.patch


 Currently, the maxAMShare per queue is -1 in default, which disables the AM 
 share constraint. Change to 0.5f would be good.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2505) [YARN-796] Support get/add/remove/change labels in RM REST API

2014-09-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143998#comment-14143998
 ] 

Hadoop QA commented on YARN-2505:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670512/YARN-2505.1.patch
  against trunk revision 23e17ce.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5073//console

This message is automatically generated.

 [YARN-796] Support get/add/remove/change labels in RM REST API
 --

 Key: YARN-2505
 URL: https://issues.apache.org/jira/browse/YARN-2505
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Craig Welch
 Attachments: YARN-2505.1.patch, YARN-2505.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2539) FairScheduler: Set the default value for maxAMShare to 0.5

2014-09-22 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2539:
---
Summary: FairScheduler: Set the default value for maxAMShare to 0.5  (was: 
FairScheduler: Update the default value for maxAMShare)

 FairScheduler: Set the default value for maxAMShare to 0.5
 --

 Key: YARN-2539
 URL: https://issues.apache.org/jira/browse/YARN-2539
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor
 Attachments: YARN-2539-1.patch


 Currently, the maxAMShare per queue is -1 in default, which disables the AM 
 share constraint. Change to 0.5f would be good.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2578) NM does not failover timely if RM node network connection fails

2014-09-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144004#comment-14144004
 ] 

Hadoop QA commented on YARN-2578:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670359/YARN-2578.patch
  against trunk revision 23e17ce.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5071//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5071//console

This message is automatically generated.

 NM does not failover timely if RM node network connection fails
 ---

 Key: YARN-2578
 URL: https://issues.apache.org/jira/browse/YARN-2578
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.1
Reporter: Wilfred Spiegelenburg
 Attachments: YARN-2578.patch


 The NM does not fail over correctly when the network cable of the RM is 
 unplugged or the failure is simulated by a service network stop or a 
 firewall that drops all traffic on the node. The RM fails over to the standby 
 node when the failure is detected as expected. The NM should than re-register 
 with the new active RM. This re-register takes a long time (15 minutes or 
 more). Until then the cluster has no nodes for processing and applications 
 are stuck.
 Reproduction test case which can be used in any environment:
 - create a cluster with 3 nodes
 node 1: ZK, NN, JN, ZKFC, DN, RM, NM
 node 2: ZK, NN, JN, ZKFC, DN, RM, NM
 node 3: ZK, JN, DN, NM
 - start all services make sure they are in good health
 - kill the network connection of the RM that is active using one of the 
 network kills from above
 - observe the NN and RM failover
 - the DN's fail over to the new active NN
 - the NM does not recover for a long time
 - the logs show a long delay and traces show no change at all
 The stack traces of the NM all show the same set of threads. The main thread 
 which should be used in the re-register is the Node Status Updater This 
 thread is stuck in:
 {code}
 Node Status Updater prio=10 tid=0x7f5a6cc99800 nid=0x18d0 in 
 Object.wait() [0x7f5a51fc1000]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0xed62f488 (a org.apache.hadoop.ipc.Client$Call)
   at java.lang.Object.wait(Object.java:503)
   at org.apache.hadoop.ipc.Client.call(Client.java:1395)
   - locked 0xed62f488 (a org.apache.hadoop.ipc.Client$Call)
   at org.apache.hadoop.ipc.Client.call(Client.java:1362)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
   at com.sun.proxy.$Proxy26.nodeHeartbeat(Unknown Source)
   at 
 org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.nodeHeartbeat(ResourceTrackerPBClientImpl.java:80)
 {code}
 The client connection which goes through the proxy can be traced back to the 
 ResourceTrackerPBClientImpl. The generated proxy does not time out and we 
 should be using a version which takes the RPC timeout (from the 
 configuration) as a parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2129) Add scheduling priority to the WindowsSecureContainerExecutor

2014-09-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144006#comment-14144006
 ] 

Hadoop QA commented on YARN-2129:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12649565/YARN-2129.2.patch
  against trunk revision 43efdd3.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5074//console

This message is automatically generated.

 Add scheduling priority to the WindowsSecureContainerExecutor
 -

 Key: YARN-2129
 URL: https://issues.apache.org/jira/browse/YARN-2129
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: YARN-2129.1.patch, YARN-2129.2.patch


 The WCE (YARN-1972) could and should honor 
 NM_CONTAINER_EXECUTOR_SCHED_PRIORITY.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2578) NM does not failover timely if RM node network connection fails

2014-09-22 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144016#comment-14144016
 ] 

Wilfred Spiegelenburg commented on YARN-2578:
-

To address [~vinodkv] comments: The active RM is completely shut off from the 
network so are all the other services on the node, including zookeeper. The RM 
can update zookeeper but that will never be propagated outside of the node to 
the other zookeeper nodes. It can thus not be seen by the standby RM. The 
standby RM detects no updates in zookeeper for the timeout period and becomes 
the active node. That is the normal HA behaviour from the standby node as if 
the RM would have crashed.


 NM does not failover timely if RM node network connection fails
 ---

 Key: YARN-2578
 URL: https://issues.apache.org/jira/browse/YARN-2578
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.1
Reporter: Wilfred Spiegelenburg
 Attachments: YARN-2578.patch


 The NM does not fail over correctly when the network cable of the RM is 
 unplugged or the failure is simulated by a service network stop or a 
 firewall that drops all traffic on the node. The RM fails over to the standby 
 node when the failure is detected as expected. The NM should than re-register 
 with the new active RM. This re-register takes a long time (15 minutes or 
 more). Until then the cluster has no nodes for processing and applications 
 are stuck.
 Reproduction test case which can be used in any environment:
 - create a cluster with 3 nodes
 node 1: ZK, NN, JN, ZKFC, DN, RM, NM
 node 2: ZK, NN, JN, ZKFC, DN, RM, NM
 node 3: ZK, JN, DN, NM
 - start all services make sure they are in good health
 - kill the network connection of the RM that is active using one of the 
 network kills from above
 - observe the NN and RM failover
 - the DN's fail over to the new active NN
 - the NM does not recover for a long time
 - the logs show a long delay and traces show no change at all
 The stack traces of the NM all show the same set of threads. The main thread 
 which should be used in the re-register is the Node Status Updater This 
 thread is stuck in:
 {code}
 Node Status Updater prio=10 tid=0x7f5a6cc99800 nid=0x18d0 in 
 Object.wait() [0x7f5a51fc1000]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0xed62f488 (a org.apache.hadoop.ipc.Client$Call)
   at java.lang.Object.wait(Object.java:503)
   at org.apache.hadoop.ipc.Client.call(Client.java:1395)
   - locked 0xed62f488 (a org.apache.hadoop.ipc.Client$Call)
   at org.apache.hadoop.ipc.Client.call(Client.java:1362)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
   at com.sun.proxy.$Proxy26.nodeHeartbeat(Unknown Source)
   at 
 org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.nodeHeartbeat(ResourceTrackerPBClientImpl.java:80)
 {code}
 The client connection which goes through the proxy can be traced back to the 
 ResourceTrackerPBClientImpl. The generated proxy does not time out and we 
 should be using a version which takes the RPC timeout (from the 
 configuration) as a parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2578) NM does not failover timely if RM node network connection fails

2014-09-22 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144022#comment-14144022
 ] 

Wilfred Spiegelenburg commented on YARN-2578:
-

I looked into automated testing but like in HDFS-4858 I have not been able to 
find a way to test this using junit tests. Manual testing is really simple 
using the above reproduction scenario.

 NM does not failover timely if RM node network connection fails
 ---

 Key: YARN-2578
 URL: https://issues.apache.org/jira/browse/YARN-2578
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.1
Reporter: Wilfred Spiegelenburg
 Attachments: YARN-2578.patch


 The NM does not fail over correctly when the network cable of the RM is 
 unplugged or the failure is simulated by a service network stop or a 
 firewall that drops all traffic on the node. The RM fails over to the standby 
 node when the failure is detected as expected. The NM should than re-register 
 with the new active RM. This re-register takes a long time (15 minutes or 
 more). Until then the cluster has no nodes for processing and applications 
 are stuck.
 Reproduction test case which can be used in any environment:
 - create a cluster with 3 nodes
 node 1: ZK, NN, JN, ZKFC, DN, RM, NM
 node 2: ZK, NN, JN, ZKFC, DN, RM, NM
 node 3: ZK, JN, DN, NM
 - start all services make sure they are in good health
 - kill the network connection of the RM that is active using one of the 
 network kills from above
 - observe the NN and RM failover
 - the DN's fail over to the new active NN
 - the NM does not recover for a long time
 - the logs show a long delay and traces show no change at all
 The stack traces of the NM all show the same set of threads. The main thread 
 which should be used in the re-register is the Node Status Updater This 
 thread is stuck in:
 {code}
 Node Status Updater prio=10 tid=0x7f5a6cc99800 nid=0x18d0 in 
 Object.wait() [0x7f5a51fc1000]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0xed62f488 (a org.apache.hadoop.ipc.Client$Call)
   at java.lang.Object.wait(Object.java:503)
   at org.apache.hadoop.ipc.Client.call(Client.java:1395)
   - locked 0xed62f488 (a org.apache.hadoop.ipc.Client$Call)
   at org.apache.hadoop.ipc.Client.call(Client.java:1362)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
   at com.sun.proxy.$Proxy26.nodeHeartbeat(Unknown Source)
   at 
 org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.nodeHeartbeat(ResourceTrackerPBClientImpl.java:80)
 {code}
 The client connection which goes through the proxy can be traced back to the 
 ResourceTrackerPBClientImpl. The generated proxy does not time out and we 
 should be using a version which takes the RPC timeout (from the 
 configuration) as a parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2312) Marking ContainerId#getId as deprecated

2014-09-22 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144037#comment-14144037
 ] 

Tsuyoshi OZAWA commented on YARN-2312:
--

Talked with Jian offline.

{quote}
2. Priority. Can we change the definition of Proto? It's used widely and one 
concern is backward compatibility.
{quote}

Priority class is used with ContainerId#getId only in test code(e.g. 
ApplicationHistoryStoreTestUtils). We can leave it for now.

 Marking ContainerId#getId as deprecated
 ---

 Key: YARN-2312
 URL: https://issues.apache.org/jira/browse/YARN-2312
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2312-wip.patch


 {{ContainerId#getId}} will only return partial value of containerId, only 
 sequence number of container id without epoch, after YARN-2229. We should 
 mark {{ContainerId#getId}} as deprecated and use 
 {{ContainerId#getContainerId}} instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1959) Fix headroom calculation in Fair Scheduler

2014-09-22 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144043#comment-14144043
 ] 

Karthik Kambatla commented on YARN-1959:


Thanks Anubhav. 

Thought about this a little more, and I wonder if we need to have separate 
headroom calculations for policies. Would DRF#getHeadroom not work for other 
policies? 

 Fix headroom calculation in Fair Scheduler
 --

 Key: YARN-1959
 URL: https://issues.apache.org/jira/browse/YARN-1959
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Sandy Ryza
Assignee: Anubhav Dhoot
 Attachments: YARN-1959.001.patch, YARN-1959.prelim.patch


 The Fair Scheduler currently always sets the headroom to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2320) Removing old application history store after we store the history data to timeline store

2014-09-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144053#comment-14144053
 ] 

Hadoop QA commented on YARN-2320:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670492/YARN-2320.2.patch
  against trunk revision 23e17ce.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 23 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests:

  org.apache.hadoop.yarn.server.TestContainerManagerSecurity

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests:

org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage
org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA
org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore
org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5072//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5072//console

This message is automatically generated.

 Removing old application history store after we store the history data to 
 timeline store
 

 Key: YARN-2320
 URL: https://issues.apache.org/jira/browse/YARN-2320
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2320.1.patch, YARN-2320.2.patch


 After YARN-2033, we should deprecate application history store set. There's 
 no need to maintain two sets of store interfaces. In addition, we should 
 conclude the outstanding jira's under YARN-321 about the application history 
 store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2569) Log Handling for LRS API Changes

2014-09-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144075#comment-14144075
 ] 

Hadoop QA commented on YARN-2569:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670525/YARN-2569.4.patch
  against trunk revision 43efdd3.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5075//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5075//console

This message is automatically generated.

 Log Handling for LRS API Changes
 

 Key: YARN-2569
 URL: https://issues.apache.org/jira/browse/YARN-2569
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2569.1.patch, YARN-2569.2.patch, YARN-2569.3.patch, 
 YARN-2569.4.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1959) Fix headroom calculation in Fair Scheduler

2014-09-22 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144076#comment-14144076
 ] 

Anubhav Dhoot commented on YARN-1959:
-

The queue fair share for fifo and fair policies, sets CPU to zero always. Thus 
using DRF calculations would cause the headroom to always be set to zero CPU. 
That can be incorrectly interpreted by the user as having no headroom for CPU 
(instead of don't care about CPU headroom). 

 Fix headroom calculation in Fair Scheduler
 --

 Key: YARN-1959
 URL: https://issues.apache.org/jira/browse/YARN-1959
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Sandy Ryza
Assignee: Anubhav Dhoot
 Attachments: YARN-1959.001.patch, YARN-1959.prelim.patch


 The Fair Scheduler currently always sets the headroom to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2056) Disable preemption at Queue level

2014-09-22 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144087#comment-14144087
 ] 

Eric Payne commented on YARN-2056:
--

[~leftnoteasy]: Good catch! It's actually even worse than what you specified. 
The way the patch is written now, if the disable preemption queue is 1) over 
capacity and 2) asking for more resources, it will preempt from other queues 
and make them go below their guarantee!

I don't have a good suggestion to fix the problem you have outlined other than 
stating the following:
If a queue is over capacity and has untouchable resources in its pool, it 
cannot preempt other queues at that level. In other words, if you disable 
preemption on a queue, the only way it will get over it's capacity is when 
other resources free up. Those other resources won't be preempted to fulfill a 
non-preemptable queues request if that non-preemptable queue is already over 
capacity.

 Disable preemption at Queue level
 -

 Key: YARN-2056
 URL: https://issues.apache.org/jira/browse/YARN-2056
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Mayank Bansal
Assignee: Eric Payne
 Attachments: YARN-2056.201408202039.txt, YARN-2056.201408260128.txt, 
 YARN-2056.201408310117.txt, YARN-2056.201409022208.txt, 
 YARN-2056.201409181916.txt, YARN-2056.201409210049.txt


 We need to be able to disable preemption at individual queue level



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-09-22 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144090#comment-14144090
 ] 

Craig Welch commented on YARN-796:
--

It looks like the FileSystemNodeLabelManager will just append changes to the 
edit log forever, until it is restarted, is that correct?  If so, a 
long-running cluster with lots of changes could result in a rather large edit 
log.  I think every so many writes (N writes) a recovery should be forced to 
clean up the edit log and consolidate state (do a recover...)

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.1
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, 
 Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, 
 YARN-796.node-label.consolidate.1.patch, 
 YARN-796.node-label.consolidate.2.patch, 
 YARN-796.node-label.consolidate.3.patch, 
 YARN-796.node-label.consolidate.4.patch, 
 YARN-796.node-label.consolidate.5.patch, 
 YARN-796.node-label.consolidate.6.patch, 
 YARN-796.node-label.consolidate.7.patch, 
 YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
 YARN-796.patch, YARN-796.patch4


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1959) Fix headroom calculation in Fair Scheduler

2014-09-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144099#comment-14144099
 ] 

Hadoop QA commented on YARN-1959:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670524/YARN-1959.001.patch
  against trunk revision 43efdd3.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore
org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens
org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup
org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage
org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5076//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5076//console

This message is automatically generated.

 Fix headroom calculation in Fair Scheduler
 --

 Key: YARN-1959
 URL: https://issues.apache.org/jira/browse/YARN-1959
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Sandy Ryza
Assignee: Anubhav Dhoot
 Attachments: YARN-1959.001.patch, YARN-1959.prelim.patch


 The Fair Scheduler currently always sets the headroom to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1959) Fix headroom calculation in Fair Scheduler

2014-09-22 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144111#comment-14144111
 ] 

Karthik Kambatla commented on YARN-1959:


Thanks for the clarification here and offline. I understand why the headroom 
needs to policy specific. Couple of nits:
# In FifoPolicy and FairSharePolicy, we can avoid one instance of Resource - 
{{queueAvailable}}, and use an int for memory instead. May be, we should just 
use two ints in DRFPolicy as well. 
# TestFSAppAttempt#VerifyHeadroom should be verifyHeadroom. 



 Fix headroom calculation in Fair Scheduler
 --

 Key: YARN-1959
 URL: https://issues.apache.org/jira/browse/YARN-1959
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Sandy Ryza
Assignee: Anubhav Dhoot
 Attachments: YARN-1959.001.patch, YARN-1959.prelim.patch


 The Fair Scheduler currently always sets the headroom to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1959) Fix headroom calculation in Fair Scheduler

2014-09-22 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-1959:

Attachment: YARN-1959.002.patch

Addressed feedback

 Fix headroom calculation in Fair Scheduler
 --

 Key: YARN-1959
 URL: https://issues.apache.org/jira/browse/YARN-1959
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Sandy Ryza
Assignee: Anubhav Dhoot
 Attachments: YARN-1959.001.patch, YARN-1959.002.patch, 
 YARN-1959.prelim.patch


 The Fair Scheduler currently always sets the headroom to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2168) SCM/Client/NM/Admin protocols

2014-09-22 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144152#comment-14144152
 ] 

Chris Trezzo commented on YARN-2168:


Thanks for the comments [~vinodkv]. I will make changes to reflect all of these 
comments in the appropriate sub-patches.

 SCM/Client/NM/Admin protocols
 -

 Key: YARN-2168
 URL: https://issues.apache.org/jira/browse/YARN-2168
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2168-trunk-v1.patch, YARN-2168-trunk-v2.patch


 This jira is meant to be used to review the main shared cache APIs. They are 
 as follows:
 * ClientSCMProtocol - The protocol between the yarn client and the cache 
 manager. This protocol controls how resources in the cache are claimed and 
 released.
 ** UseSharedCacheResourceRequest
 ** UseSharedCacheResourceResponse
 ** ReleaseSharedCacheResourceRequest
 ** ReleaseSharedCacheResourceResponse
 * SCMAdminProtocol - This is an administrative protocol for the cache 
 manager. It allows administrators to manually trigger cleaner runs.
 ** RunSharedCacheCleanerTaskRequest
 ** RunSharedCacheCleanerTaskResponse
 * NMCacheUploaderSCMProtocol - The protocol between the NodeManager and the 
 cache manager. This allows the NodeManager to coordinate with the cache 
 manager when uploading new resources to the shared cache.
 ** NotifySCMRequest
 ** NotifySCMResponse



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2569) Log Handling for LRS API Changes

2014-09-22 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2569:

Attachment: YARN-2569.4.1.patch

 Log Handling for LRS API Changes
 

 Key: YARN-2569
 URL: https://issues.apache.org/jira/browse/YARN-2569
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2569.1.patch, YARN-2569.2.patch, YARN-2569.3.patch, 
 YARN-2569.4.1.patch, YARN-2569.4.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1959) Fix headroom calculation in Fair Scheduler

2014-09-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144184#comment-14144184
 ] 

Hadoop QA commented on YARN-1959:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670555/YARN-1959.002.patch
  against trunk revision 7b8df93.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore
org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup
org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5077//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5077//console

This message is automatically generated.

 Fix headroom calculation in Fair Scheduler
 --

 Key: YARN-1959
 URL: https://issues.apache.org/jira/browse/YARN-1959
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Sandy Ryza
Assignee: Anubhav Dhoot
 Attachments: YARN-1959.001.patch, YARN-1959.002.patch, 
 YARN-1959.prelim.patch


 The Fair Scheduler currently always sets the headroom to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2569) Log Handling for LRS API Changes

2014-09-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144198#comment-14144198
 ] 

Hadoop QA commented on YARN-2569:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670568/YARN-2569.4.1.patch
  against trunk revision 7b8df93.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5078//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5078//console

This message is automatically generated.

 Log Handling for LRS API Changes
 

 Key: YARN-2569
 URL: https://issues.apache.org/jira/browse/YARN-2569
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2569.1.patch, YARN-2569.2.patch, YARN-2569.3.patch, 
 YARN-2569.4.1.patch, YARN-2569.4.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2569) Log Handling for LRS API Changes

2014-09-22 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144268#comment-14144268
 ] 

Zhijie Shen commented on YARN-2569:
---

LGTM in general. Some comments about the patch.

1. Per discussion offline, is it a bit aggressive to mark the new APIs 
\@Stable? In particular when the class is marked \@Evolving. BTW, should we 
make LogAggregationContext \@Public?

2. It's good to describe what kind of pattern the user should use? Wildcard 
patten? 
http://en.wikipedia.org/wiki/Wildcard_character#File_and_directory_patterns

3. Miss a full stop?
{code}
+ * how often the logAggregationSerivce uploads container logs in seconds
{code}

4. The description is broken?
{code}
+   *  to set
{code}

5. It shouldn't be part of API?
{code}
+
+  @Private
+  public abstract LogAggregationContextProto getProto();
{code}

 Log Handling for LRS API Changes
 

 Key: YARN-2569
 URL: https://issues.apache.org/jira/browse/YARN-2569
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2569.1.patch, YARN-2569.2.patch, YARN-2569.3.patch, 
 YARN-2569.4.1.patch, YARN-2569.4.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2581) NMs need to find a way to get LogAggregationContext

2014-09-22 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2581:

Attachment: YARN-2581.1.patch

 NMs need to find a way to get LogAggregationContext
 ---

 Key: YARN-2581
 URL: https://issues.apache.org/jira/browse/YARN-2581
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2581.1.patch


 After YARN-2569, we have LogAggregationContext for application in 
 ApplicationSubmissionContext. NMs need to find a way to get this information.
 We have this requirement: For all containers in the same application should 
 honor the same LogAggregationContext.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2584) TestContainerManagerSecurity fails on trunk

2014-09-22 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2584:
-

 Summary: TestContainerManagerSecurity fails on trunk
 Key: YARN-2584
 URL: https://issues.apache.org/jira/browse/YARN-2584
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Zhijie Shen


{code}
Tests run: 4, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 561.964 sec  
FAILURE! - in org.apache.hadoop.yarn.server.TestContainerManagerSecurity
testContainerManager[0](org.apache.hadoop.yarn.server.TestContainerManagerSecurity)
  Time elapsed: 259.553 sec   FAILURE!
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertFalse(Assert.java:64)
at org.junit.Assert.assertFalse(Assert.java:74)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.waitForContainerToFinishOnNM(TestContainerManagerSecurity.java:365)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:304)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:149)

testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity)
  Time elapsed: 258.762 sec   FAILURE!
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertFalse(Assert.java:64)
at org.junit.Assert.assertFalse(Assert.java:74)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.waitForContainerToFinishOnNM(TestContainerManagerSecurity.java:365)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:304)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:149)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2320) Removing old application history store after we store the history data to timeline store

2014-09-22 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144301#comment-14144301
 ] 

Zhijie Shen commented on YARN-2320:
---

The console log only shows TestContainerManagerSecurity, which seems to fail on 
trunk as well. File a Jira for it: YARN-2584

 Removing old application history store after we store the history data to 
 timeline store
 

 Key: YARN-2320
 URL: https://issues.apache.org/jira/browse/YARN-2320
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2320.1.patch, YARN-2320.2.patch


 After YARN-2033, we should deprecate application history store set. There's 
 no need to maintain two sets of store interfaces. In addition, we should 
 conclude the outstanding jira's under YARN-321 about the application history 
 store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2585) TestContainerManagerSecurity failed on trunk

2014-09-22 Thread Junping Du (JIRA)
Junping Du created YARN-2585:


 Summary: TestContainerManagerSecurity failed on trunk
 Key: YARN-2585
 URL: https://issues.apache.org/jira/browse/YARN-2585
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Junping Du
Assignee: Jian He






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2584) TestContainerManagerSecurity fails on trunk

2014-09-22 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He reassigned YARN-2584:
-

Assignee: Jian He

 TestContainerManagerSecurity fails on trunk
 ---

 Key: YARN-2584
 URL: https://issues.apache.org/jira/browse/YARN-2584
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Zhijie Shen
Assignee: Jian He

 {code}
 Tests run: 4, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 561.964 sec 
  FAILURE! - in org.apache.hadoop.yarn.server.TestContainerManagerSecurity
 testContainerManager[0](org.apache.hadoop.yarn.server.TestContainerManagerSecurity)
   Time elapsed: 259.553 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertFalse(Assert.java:64)
   at org.junit.Assert.assertFalse(Assert.java:74)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.waitForContainerToFinishOnNM(TestContainerManagerSecurity.java:365)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:304)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:149)
 testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity)
   Time elapsed: 258.762 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertFalse(Assert.java:64)
   at org.junit.Assert.assertFalse(Assert.java:74)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.waitForContainerToFinishOnNM(TestContainerManagerSecurity.java:365)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:304)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:149)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2584) TestContainerManagerSecurity fails on trunk

2014-09-22 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2584:
--
Attachment: YARN-2584.1.patch

uploaded a patch

 TestContainerManagerSecurity fails on trunk
 ---

 Key: YARN-2584
 URL: https://issues.apache.org/jira/browse/YARN-2584
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Zhijie Shen
Assignee: Jian He
 Attachments: YARN-2584.1.patch


 {code}
 Tests run: 4, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 561.964 sec 
  FAILURE! - in org.apache.hadoop.yarn.server.TestContainerManagerSecurity
 testContainerManager[0](org.apache.hadoop.yarn.server.TestContainerManagerSecurity)
   Time elapsed: 259.553 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertFalse(Assert.java:64)
   at org.junit.Assert.assertFalse(Assert.java:74)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.waitForContainerToFinishOnNM(TestContainerManagerSecurity.java:365)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:304)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:149)
 testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity)
   Time elapsed: 258.762 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertFalse(Assert.java:64)
   at org.junit.Assert.assertFalse(Assert.java:74)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.waitForContainerToFinishOnNM(TestContainerManagerSecurity.java:365)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:304)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:149)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2585) TestContainerManagerSecurity failed on trunk

2014-09-22 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du resolved YARN-2585.
--
Resolution: Duplicate

 TestContainerManagerSecurity failed on trunk
 

 Key: YARN-2585
 URL: https://issues.apache.org/jira/browse/YARN-2585
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Junping Du
Assignee: Jian He





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2581) NMs need to find a way to get LogAggregationContext

2014-09-22 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2581:

Attachment: YARN-2581.2.patch

 NMs need to find a way to get LogAggregationContext
 ---

 Key: YARN-2581
 URL: https://issues.apache.org/jira/browse/YARN-2581
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2581.1.patch, YARN-2581.2.patch


 After YARN-2569, we have LogAggregationContext for application in 
 ApplicationSubmissionContext. NMs need to find a way to get this information.
 We have this requirement: For all containers in the same application should 
 honor the same LogAggregationContext.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2584) TestContainerManagerSecurity fails on trunk

2014-09-22 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144321#comment-14144321
 ] 

Junping Du commented on YARN-2584:
--

Patch looks good to me. +1 pending on Jenkins result.

 TestContainerManagerSecurity fails on trunk
 ---

 Key: YARN-2584
 URL: https://issues.apache.org/jira/browse/YARN-2584
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Zhijie Shen
Assignee: Jian He
 Attachments: YARN-2584.1.patch


 {code}
 Tests run: 4, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 561.964 sec 
  FAILURE! - in org.apache.hadoop.yarn.server.TestContainerManagerSecurity
 testContainerManager[0](org.apache.hadoop.yarn.server.TestContainerManagerSecurity)
   Time elapsed: 259.553 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertFalse(Assert.java:64)
   at org.junit.Assert.assertFalse(Assert.java:74)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.waitForContainerToFinishOnNM(TestContainerManagerSecurity.java:365)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:304)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:149)
 testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity)
   Time elapsed: 258.762 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertFalse(Assert.java:64)
   at org.junit.Assert.assertFalse(Assert.java:74)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.waitForContainerToFinishOnNM(TestContainerManagerSecurity.java:365)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:304)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:149)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2569) Log Handling for LRS API Changes

2014-09-22 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2569:

Attachment: YARN-2569.5.patch

Addressed all the comments

 Log Handling for LRS API Changes
 

 Key: YARN-2569
 URL: https://issues.apache.org/jira/browse/YARN-2569
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2569.1.patch, YARN-2569.2.patch, YARN-2569.3.patch, 
 YARN-2569.4.1.patch, YARN-2569.4.patch, YARN-2569.5.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2312) Marking ContainerId#getId as deprecated

2014-09-22 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2312:
-
Attachment: YARN-2312.1.patch

 Marking ContainerId#getId as deprecated
 ---

 Key: YARN-2312
 URL: https://issues.apache.org/jira/browse/YARN-2312
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2312-wip.patch, YARN-2312.1.patch


 {{ContainerId#getId}} will only return partial value of containerId, only 
 sequence number of container id without epoch, after YARN-2229. We should 
 mark {{ContainerId#getId}} as deprecated and use 
 {{ContainerId#getContainerId}} instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2312) Marking ContainerId#getId as deprecated

2014-09-22 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144335#comment-14144335
 ] 

Tsuyoshi OZAWA commented on YARN-2312:
--

Attached a first patch.

 Marking ContainerId#getId as deprecated
 ---

 Key: YARN-2312
 URL: https://issues.apache.org/jira/browse/YARN-2312
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2312-wip.patch, YARN-2312.1.patch


 {{ContainerId#getId}} will only return partial value of containerId, only 
 sequence number of container id without epoch, after YARN-2229. We should 
 mark {{ContainerId#getId}} as deprecated and use 
 {{ContainerId#getContainerId}} instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2468) Log handling for LRS

2014-09-22 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2468:

Attachment: YARN-2468.7.patch

 Log handling for LRS
 

 Key: YARN-2468
 URL: https://issues.apache.org/jira/browse/YARN-2468
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2468.1.patch, YARN-2468.2.patch, YARN-2468.3.patch, 
 YARN-2468.3.rebase.2.patch, YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, 
 YARN-2468.4.patch, YARN-2468.5.1.patch, YARN-2468.5.1.patch, 
 YARN-2468.5.2.patch, YARN-2468.5.3.patch, YARN-2468.5.4.patch, 
 YARN-2468.5.patch, YARN-2468.6.1.patch, YARN-2468.6.patch, YARN-2468.7.patch


 Currently, when application is finished, NM will start to do the log 
 aggregation. But for Long running service applications, this is not ideal. 
 The problems we have are:
 1) LRS applications are expected to run for a long time (weeks, months).
 2) Currently, all the container logs (from one NM) will be written into a 
 single file. The files could become larger and larger.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2584) TestContainerManagerSecurity fails on trunk

2014-09-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144349#comment-14144349
 ] 

Hadoop QA commented on YARN-2584:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670603/YARN-2584.1.patch
  against trunk revision 7b8df93.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5079//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5079//console

This message is automatically generated.

 TestContainerManagerSecurity fails on trunk
 ---

 Key: YARN-2584
 URL: https://issues.apache.org/jira/browse/YARN-2584
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Zhijie Shen
Assignee: Jian He
 Attachments: YARN-2584.1.patch


 {code}
 Tests run: 4, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 561.964 sec 
  FAILURE! - in org.apache.hadoop.yarn.server.TestContainerManagerSecurity
 testContainerManager[0](org.apache.hadoop.yarn.server.TestContainerManagerSecurity)
   Time elapsed: 259.553 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertFalse(Assert.java:64)
   at org.junit.Assert.assertFalse(Assert.java:74)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.waitForContainerToFinishOnNM(TestContainerManagerSecurity.java:365)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:304)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:149)
 testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity)
   Time elapsed: 258.762 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertFalse(Assert.java:64)
   at org.junit.Assert.assertFalse(Assert.java:74)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.waitForContainerToFinishOnNM(TestContainerManagerSecurity.java:365)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:304)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:149)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2569) Log Handling for LRS API Changes

2014-09-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144367#comment-14144367
 ] 

Hadoop QA commented on YARN-2569:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670606/YARN-2569.5.patch
  against trunk revision 7b8df93.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5080//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5080//console

This message is automatically generated.

 Log Handling for LRS API Changes
 

 Key: YARN-2569
 URL: https://issues.apache.org/jira/browse/YARN-2569
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2569.1.patch, YARN-2569.2.patch, YARN-2569.3.patch, 
 YARN-2569.4.1.patch, YARN-2569.4.patch, YARN-2569.5.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2569) Log Handling for LRS API Changes

2014-09-22 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144381#comment-14144381
 ] 

Zhijie Shen commented on YARN-2569:
---

+1 for the latest patch. Leave it tomorrow in case Vinod has further comments 
about it.

 Log Handling for LRS API Changes
 

 Key: YARN-2569
 URL: https://issues.apache.org/jira/browse/YARN-2569
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2569.1.patch, YARN-2569.2.patch, YARN-2569.3.patch, 
 YARN-2569.4.1.patch, YARN-2569.4.patch, YARN-2569.5.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)