[jira] [Commented] (YARN-668) TokenIdentifier serialization should consider Unknown fields

2014-09-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141791#comment-14141791
 ] 

Hadoop QA commented on YARN-668:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670165/YARN-668.patch
  against trunk revision f85cc14.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 22 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 3 new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests:

  
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.TestApplication
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.TestContainer

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5058//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5058//artifact/PreCommit-HADOOP-Build-patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5058//artifact/PreCommit-HADOOP-Build-patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5058//console

This message is automatically generated.

> TokenIdentifier serialization should consider Unknown fields
> 
>
> Key: YARN-668
> URL: https://issues.apache.org/jira/browse/YARN-668
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Junping Du
>Priority: Blocker
> Attachments: YARN-668-demo.patch, YARN-668.patch
>
>
> This would allow changing of the TokenIdentifier between versions. The 
> current serialization is Writable. A simple way to achieve this would be to 
> have a Proto object as the payload for TokenIdentifiers, instead of 
> individual fields.
> TokenIdentifier continues to implement Writable to work with the RPC layer - 
> but the payload itself is serialized using PB.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels

2014-09-19 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141736#comment-14141736
 ] 

Craig Welch commented on YARN-2496:
---

ach, not finished - anyway, re 2 - a particular job may well only be able to 
use nodes with one label in a queue, and so if the headroom includes nodes 
without that label, we'll end up with another deadlock case where it spins up 
reducers too early and then can't complete its maps.  It is definately a valid 
usecase to have a queue with two lables (a and b, as in this example) and an 
app which is consistently requesting only one of those two labels (from 
submission to completion...) - perhaps only "a" nodes have the "special 
resource" it needs (special hardware capability, etc).  For this reason, 
headroom should reflect the labels in the last resource request from the 
application, not the queue's labels.

(-re 5, I though * could be in requests, if no, then should not be an issue.)

> [YARN-796] Changes for capacity scheduler to support allocate resource 
> respect labels
> -
>
> Key: YARN-2496
> URL: https://issues.apache.org/jira/browse/YARN-2496
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, 
> YARN-2496.patch
>
>
> This JIRA Includes:
> - Add/parse labels option to {{capacity-scheduler.xml}} similar to other 
> options of queue like capacity/maximum-capacity, etc.
> - Include a "default-label-expression" option in queue config, if an app 
> doesn't specify label-expression, "default-label-expression" of queue will be 
> used.
> - Check if labels can be accessed by the queue when submit an app with 
> labels-expression to queue or update ResourceRequest with label-expression
> - Check labels on NM when trying to allocate ResourceRequest on the NM with 
> label-expression
> - Respect  labels when calculate headroom/user-limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels

2014-09-19 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141724#comment-14141724
 ] 

Craig Welch commented on YARN-2496:
---

Ok, so it sounds like 1, 3 and 4 are ok.  

I think that 2 is still a problem though - headroom is an app level value, and 
even though an app may be able to use either label in a resource request it 
will, in some cases, not be able to.  A typical case will be where a subset of 
nodes in  

> [YARN-796] Changes for capacity scheduler to support allocate resource 
> respect labels
> -
>
> Key: YARN-2496
> URL: https://issues.apache.org/jira/browse/YARN-2496
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, 
> YARN-2496.patch
>
>
> This JIRA Includes:
> - Add/parse labels option to {{capacity-scheduler.xml}} similar to other 
> options of queue like capacity/maximum-capacity, etc.
> - Include a "default-label-expression" option in queue config, if an app 
> doesn't specify label-expression, "default-label-expression" of queue will be 
> used.
> - Check if labels can be accessed by the queue when submit an app with 
> labels-expression to queue or update ResourceRequest with label-expression
> - Check labels on NM when trying to allocate ResourceRequest on the NM with 
> label-expression
> - Respect  labels when calculate headroom/user-limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2569) Log Handling for LRS API Changes

2014-09-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141694#comment-14141694
 ] 

Hadoop QA commented on YARN-2569:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670166/YARN-2569.3.patch
  against trunk revision b6ceef9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5057//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5057//console

This message is automatically generated.

> Log Handling for LRS API Changes
> 
>
> Key: YARN-2569
> URL: https://issues.apache.org/jira/browse/YARN-2569
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2569.1.patch, YARN-2569.2.patch, YARN-2569.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-668) TokenIdentifier serialization should consider Unknown fields

2014-09-19 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-668:

Attachment: YARN-668.patch

Update the patch against NMTokenIdentifier, AMRMTokenIdentifier and 
ContainerTokenIdentifier with unit test for token's compatibility. Will address 
refactor issues from Vinod's previous comments later.

> TokenIdentifier serialization should consider Unknown fields
> 
>
> Key: YARN-668
> URL: https://issues.apache.org/jira/browse/YARN-668
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Junping Du
>Priority: Blocker
> Attachments: YARN-668-demo.patch, YARN-668.patch
>
>
> This would allow changing of the TokenIdentifier between versions. The 
> current serialization is Writable. A simple way to achieve this would be to 
> have a Proto object as the payload for TokenIdentifiers, instead of 
> individual fields.
> TokenIdentifier continues to implement Writable to work with the RPC layer - 
> but the payload itself is serialized using PB.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2569) Log Handling for LRS API Changes

2014-09-19 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2569:

Attachment: YARN-2569.3.patch

> Log Handling for LRS API Changes
> 
>
> Key: YARN-2569
> URL: https://issues.apache.org/jira/browse/YARN-2569
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2569.1.patch, YARN-2569.2.patch, YARN-2569.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2569) Log Handling for LRS API Changes

2014-09-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141672#comment-14141672
 ] 

Hadoop QA commented on YARN-2569:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670161/YARN-2569.2.patch
  against trunk revision b6ceef9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  org.apache.hadoop.yarn.api.TestPBImplRecords

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5056//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5056//console

This message is automatically generated.

> Log Handling for LRS API Changes
> 
>
> Key: YARN-2569
> URL: https://issues.apache.org/jira/browse/YARN-2569
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2569.1.patch, YARN-2569.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels

2014-09-19 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141661#comment-14141661
 ] 

Wangda Tan commented on YARN-2496:
--

Hi [~cwelch],
Thanks for your comments:

1) Regarding,
{code}
Headroom Calculation for JobA:
userConsumed = 8G
maxCapacityConsiderLabelA = 6G (Node1 only)
headroom = -2G (assume it will normalize to 0G)
{code}

Currently, we calculate headroom by,
bq. Headroom = min(userLimit, queue-max-cap, max-capacity-consider-label) - 
consumed
The {max-capacity-consider-label} is queue-wise not app-wise, so in the 
queue-wise, the max-capacity-consider-label = node1 + node2.
You can think that, the {max-capacity-consider-label} can guarantee it's always 
larger or equals than total resource of the queue will use.

2) Regarding
bq. The "labels" are the labels for the queue, but the resource requests coming 
from the application can be a subset of that, no? So if application "a" is 
running on a queue with lables a and b, but it has a label expression of only 
a, which it is using for resource requests, it's going to get a headroom based 
on nodes with both labels a and b, but in fact it only has a "real" headroom 
for nodes with label "a"
Yes/No, because even if app-a has "a" label in app-level, it's 
ResourceRequest(s) can also overwrite it and use b. Label in app-level is just 
a default label expression when its ResourceRequest doesn't set. So, app-a can 
still use all labels of queue.

3) Regarding,
bq. On the parent/leaf refactor to share AbstractCSQueue - a great idea, 
thought about it myself when seeing the duplication,
I agree with that, I think it may not too risky but it will hide functional 
changes we made. Let's get more ideas about this, because reverting it need 
some efforts.

4) Regarding,
bq. CSQueueUtils - just removing a line, should revert
Will do 

5) Regarding,
bq. SchedulerUtils.checkNodeLabelExpression - I think there is an issue here 
with the * case
{{checkNodeLabelExpression}} is used for check if a ResourceRequest can be 
allocated on a node, we don't support specifying * in any label-expression 
(including ResourceRequest, ASC, queue-default-label-expression), that will 
cause many problem.
Instead, we support * in queue's labels (not default-label-expression), which 
means queue *can* access any labels. The checking methods are 
{{checkQueueAccessToNode}} and {{checkQueueLabelExpression}}.

Thanks,
Wangda

> [YARN-796] Changes for capacity scheduler to support allocate resource 
> respect labels
> -
>
> Key: YARN-2496
> URL: https://issues.apache.org/jira/browse/YARN-2496
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, 
> YARN-2496.patch
>
>
> This JIRA Includes:
> - Add/parse labels option to {{capacity-scheduler.xml}} similar to other 
> options of queue like capacity/maximum-capacity, etc.
> - Include a "default-label-expression" option in queue config, if an app 
> doesn't specify label-expression, "default-label-expression" of queue will be 
> used.
> - Check if labels can be accessed by the queue when submit an app with 
> labels-expression to queue or update ResourceRequest with label-expression
> - Check labels on NM when trying to allocate ResourceRequest on the NM with 
> label-expression
> - Respect  labels when calculate headroom/user-limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2569) Log Handling for LRS API Changes

2014-09-19 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141651#comment-14141651
 ] 

Xuan Gong commented on YARN-2569:
-

API Only changes:
Define:LogAggregationContext as
message LogAggregationContextProto {
 optional string include_pattern = 1 [default = ".*"];
 optional string exclude_pattern = 2 [default = ""];
 optional int64 rolling_interval_seconds = 3 [default = 0];
 }

> Log Handling for LRS API Changes
> 
>
> Key: YARN-2569
> URL: https://issues.apache.org/jira/browse/YARN-2569
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2569.1.patch, YARN-2569.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2569) Log Handling for LRS API Changes

2014-09-19 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2569:

Attachment: YARN-2569.2.patch

> Log Handling for LRS API Changes
> 
>
> Key: YARN-2569
> URL: https://issues.apache.org/jira/browse/YARN-2569
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2569.1.patch, YARN-2569.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1530) [Umbrella] Store, manage and serve per-framework application-timeline data

2014-09-19 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141644#comment-14141644
 ] 

Robert Kanter commented on YARN-1530:
-

I also agree that providing reliability through an “always-up” ATS service is 
not the optimal solution here for the reasons already mentioned.  We should 
instead make the write path and backing store reliable (or at least somehow 
recoverable).  

{quote}
Though each application can write the timeline entities into HDFS in a 
distributed manner, there’s still a single timeline server that fetches the 
files of the timeline entities written by ALL applications. The bottleneck is 
still there. Essentially I don’t see any difference between publishing entities 
via HTTP REST interface and via HDFS in terms of scalability.{quote}
Technically yes, there is still the same bottleneck.  However, with the HDFS 
channel, the ATS can essentially throttle the events  Suppose you have a 
cluster pushing X events/second to the ATS.  With the REST implementation, the 
ATS must try to handle X events every second; if it can’t keep up, or if it 
gets too many incoming connections, there’s not too much we can do here.  I 
suppose we could add active-active HA so we have more ATS servers running, but 
I’m not sure we want to make that a requirement — we’d also have to come up 
with a good way of balancing this.  With the HDFS implementation, the ATS has 
more control over how it ingests the events: for example, it could read a 
maximum of Y events per poll, or Y events per job, etc.  While this will slow 
down the availability of the events in the ATS, it will allow it to keep 
running normally and not require active-active HA.  And if we make this 
configurable enough, users with beefier ATS machines could increase Y.

It sounds like there are two areas where we’re having difficulty coming to a 
consensus:
# The write path/communication channel from the TimelineClient to the ATS or 
backing store
# The backing store itself

I can see reasons for having different implementations for the backing store 
given that HBase is a “heavy” external service and we should have something 
that works out-of-the-box.  Ideally, I think it would be best if we could all 
agree on a single write path, though making it pluggable is certainly an 
option.  As for maintaining them, I think we should be fine as long as we don’t 
have too many implementations.  We already do that for other components, such 
as the scheduler; though we should be careful to make sure that the different 
implementations only implement what they need to and any shareable code is 
shared.  In making the write path pluggable, we’d have to have two pieces: one 
to do the writing from the TimelineClient and one to the receiving in the ATS.  
These would have to be in pairs.  We’ve already discussed some different 
implementations for this: REST, Kafka, and HDFS.  

The backing store is already pluggable.  Though as bc pointed out before, it’s 
fine for more experienced users to use HBase, but “regular” users should have a 
solution as well that is hopefully more scalable and reliable than LevelDB.  It 
would be great if we could provide a backing store that’s in between LevelDB 
and HBase.  And I think it’s fine to be an external to Hadoop as long as it’s 
relatively simple to setup and maintain.  Though I’ll admit I’m not really sure 
of such a store we could use.  Does anyone have any suggestions on this?

> [Umbrella] Store, manage and serve per-framework application-timeline data
> --
>
> Key: YARN-1530
> URL: https://issues.apache.org/jira/browse/YARN-1530
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
> Attachments: ATS-Write-Pipeline-Design-Proposal.pdf, 
> ATS-meet-up-8-28-2014-notes.pdf, application timeline design-20140108.pdf, 
> application timeline design-20140116.pdf, application timeline 
> design-20140130.pdf, application timeline design-20140210.pdf
>
>
> This is a sibling JIRA for YARN-321.
> Today, each application/framework has to do store, and serve per-framework 
> data all by itself as YARN doesn't have a common solution. This JIRA attempts 
> to solve the storage, management and serving of per-framework data from 
> various applications, both running and finished. The aim is to change YARN to 
> collect and store data in a generic manner with plugin points for frameworks 
> to do their own thing w.r.t interpretation and serving.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1372) Ensure all completed containers are reported to the AMs across RM restart

2014-09-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141597#comment-14141597
 ] 

Hadoop QA commented on YARN-1372:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670139/YARN-1372.010.patch
  against trunk revision 9e35571.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5054//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5054//console

This message is automatically generated.

> Ensure all completed containers are reported to the AMs across RM restart
> -
>
> Key: YARN-1372
> URL: https://issues.apache.org/jira/browse/YARN-1372
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Anubhav Dhoot
> Attachments: YARN-1372.001.patch, YARN-1372.001.patch, 
> YARN-1372.002_NMHandlesCompletedApp.patch, 
> YARN-1372.002_RMHandlesCompletedApp.patch, 
> YARN-1372.002_RMHandlesCompletedApp.patch, YARN-1372.003.patch, 
> YARN-1372.004.patch, YARN-1372.005.patch, YARN-1372.005.patch, 
> YARN-1372.006.patch, YARN-1372.007.patch, YARN-1372.008.patch, 
> YARN-1372.009.patch, YARN-1372.009.patch, YARN-1372.010.patch, 
> YARN-1372.prelim.patch, YARN-1372.prelim2.patch
>
>
> Currently the NM informs the RM about completed containers and then removes 
> those containers from the RM notification list. The RM passes on that 
> completed container information to the AM and the AM pulls this data. If the 
> RM dies before the AM pulls this data then the AM may not be able to get this 
> information again. To fix this, NM should maintain a separate list of such 
> completed container notifications sent to the RM. After the AM has pulled the 
> containers from the RM then the RM will inform the NM about it and the NM can 
> remove the completed container from the new list. Upon re-register with the 
> RM (after RM restart) the NM should send the entire list of completed 
> containers to the RM along with any other containers that completed while the 
> RM was dead. This ensures that the RM can inform the AM's about all completed 
> containers. Some container completions may be reported more than once since 
> the AM may have pulled the container but the RM may die before notifying the 
> NM about the pull.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1372) Ensure all completed containers are reported to the AMs across RM restart

2014-09-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141534#comment-14141534
 ] 

Hadoop QA commented on YARN-1372:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670117/YARN-1372.009.patch
  against trunk revision 9e35571.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5053//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5053//artifact/PreCommit-HADOOP-Build-patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5053//console

This message is automatically generated.

> Ensure all completed containers are reported to the AMs across RM restart
> -
>
> Key: YARN-1372
> URL: https://issues.apache.org/jira/browse/YARN-1372
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Anubhav Dhoot
> Attachments: YARN-1372.001.patch, YARN-1372.001.patch, 
> YARN-1372.002_NMHandlesCompletedApp.patch, 
> YARN-1372.002_RMHandlesCompletedApp.patch, 
> YARN-1372.002_RMHandlesCompletedApp.patch, YARN-1372.003.patch, 
> YARN-1372.004.patch, YARN-1372.005.patch, YARN-1372.005.patch, 
> YARN-1372.006.patch, YARN-1372.007.patch, YARN-1372.008.patch, 
> YARN-1372.009.patch, YARN-1372.009.patch, YARN-1372.010.patch, 
> YARN-1372.prelim.patch, YARN-1372.prelim2.patch
>
>
> Currently the NM informs the RM about completed containers and then removes 
> those containers from the RM notification list. The RM passes on that 
> completed container information to the AM and the AM pulls this data. If the 
> RM dies before the AM pulls this data then the AM may not be able to get this 
> information again. To fix this, NM should maintain a separate list of such 
> completed container notifications sent to the RM. After the AM has pulled the 
> containers from the RM then the RM will inform the NM about it and the NM can 
> remove the completed container from the new list. Upon re-register with the 
> RM (after RM restart) the NM should send the entire list of completed 
> containers to the RM along with any other containers that completed while the 
> RM was dead. This ensures that the RM can inform the AM's about all completed 
> containers. Some container completions may be reported more than once since 
> the AM may have pulled the container but the RM may die before notifying the 
> NM about the pull.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2252) Intermittent failure for testcase TestFairScheduler.testContinuousScheduling

2014-09-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141515#comment-14141515
 ] 

Hadoop QA commented on YARN-2252:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12654298/YARN-2252-1.patch
  against trunk revision 9e35571.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5055//console

This message is automatically generated.

> Intermittent failure for testcase TestFairScheduler.testContinuousScheduling
> 
>
> Key: YARN-2252
> URL: https://issues.apache.org/jira/browse/YARN-2252
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: trunk-win
>Reporter: Ratandeep Ratti
>  Labels: hadoop2, scheduler, yarn
> Attachments: YARN-2252-1.patch
>
>
> This test-case is failing sporadically on my machine. I think I have a 
> plausible explanation  for this.
> It seems that when the Scheduler is being asked for resources, the resource 
> requests that are being constructed have no preference for the hosts (nodes).
> The two mock hosts constructed, both have a memory of 8192 mb.
> The containers(resources) being requested each require a memory of 1024mb, 
> hence a single node can execute both the resource requests for the 
> application.
> In the end of the test-case it is being asserted that the containers 
> (resource requests) be executed on different nodes, but since we haven't 
> specified any preferences for nodes when requesting the resources, the 
> scheduler (at times) executes both the containers (requests) on the same node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2252) Intermittent failure for testcase TestFairScheduler.testContinuousScheduling

2014-09-19 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141512#comment-14141512
 ] 

Karthik Kambatla commented on YARN-2252:


The fix is specific to testContinuousScheduling. I propose fixing all the tests 
under TestFairScheduler, by using the same global field {{scheduler}} in all 
the tests even when the test method initializes a new instance of the 
scheduler. The teardown method should also stop {{resourceManager}} if not 
null. 

We should follow up this work with (1) re-using the scheduler instance within 
{{resourceManager}} and (2) moving out tests with different FS configurations 
to different files. 

> Intermittent failure for testcase TestFairScheduler.testContinuousScheduling
> 
>
> Key: YARN-2252
> URL: https://issues.apache.org/jira/browse/YARN-2252
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: trunk-win
>Reporter: Ratandeep Ratti
>  Labels: hadoop2, scheduler, yarn
> Attachments: YARN-2252-1.patch
>
>
> This test-case is failing sporadically on my machine. I think I have a 
> plausible explanation  for this.
> It seems that when the Scheduler is being asked for resources, the resource 
> requests that are being constructed have no preference for the hosts (nodes).
> The two mock hosts constructed, both have a memory of 8192 mb.
> The containers(resources) being requested each require a memory of 1024mb, 
> hence a single node can execute both the resource requests for the 
> application.
> In the end of the test-case it is being asserted that the containers 
> (resource requests) be executed on different nodes, but since we haven't 
> specified any preferences for nodes when requesting the resources, the 
> scheduler (at times) executes both the containers (requests) on the same node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels

2014-09-19 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141508#comment-14141508
 ] 

Craig Welch commented on YARN-2496:
---


SchedulerUtils.checkNodeLabelExpression - I think there is an issue here with 
the * case, as I read the code a * case will not properly match against a node 
with a label - after the first check, there should be a check for ANY in the 
expression, and if so, return true

> [YARN-796] Changes for capacity scheduler to support allocate resource 
> respect labels
> -
>
> Key: YARN-2496
> URL: https://issues.apache.org/jira/browse/YARN-2496
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, 
> YARN-2496.patch
>
>
> This JIRA Includes:
> - Add/parse labels option to {{capacity-scheduler.xml}} similar to other 
> options of queue like capacity/maximum-capacity, etc.
> - Include a "default-label-expression" option in queue config, if an app 
> doesn't specify label-expression, "default-label-expression" of queue will be 
> used.
> - Check if labels can be accessed by the queue when submit an app with 
> labels-expression to queue or update ResourceRequest with label-expression
> - Check labels on NM when trying to allocate ResourceRequest on the NM with 
> label-expression
> - Respect  labels when calculate headroom/user-limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1372) Ensure all completed containers are reported to the AMs across RM restart

2014-09-19 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-1372:

Attachment: YARN-1372.010.patch

Fixed findbug warning and testcase failure

> Ensure all completed containers are reported to the AMs across RM restart
> -
>
> Key: YARN-1372
> URL: https://issues.apache.org/jira/browse/YARN-1372
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Anubhav Dhoot
> Attachments: YARN-1372.001.patch, YARN-1372.001.patch, 
> YARN-1372.002_NMHandlesCompletedApp.patch, 
> YARN-1372.002_RMHandlesCompletedApp.patch, 
> YARN-1372.002_RMHandlesCompletedApp.patch, YARN-1372.003.patch, 
> YARN-1372.004.patch, YARN-1372.005.patch, YARN-1372.005.patch, 
> YARN-1372.006.patch, YARN-1372.007.patch, YARN-1372.008.patch, 
> YARN-1372.009.patch, YARN-1372.009.patch, YARN-1372.010.patch, 
> YARN-1372.prelim.patch, YARN-1372.prelim2.patch
>
>
> Currently the NM informs the RM about completed containers and then removes 
> those containers from the RM notification list. The RM passes on that 
> completed container information to the AM and the AM pulls this data. If the 
> RM dies before the AM pulls this data then the AM may not be able to get this 
> information again. To fix this, NM should maintain a separate list of such 
> completed container notifications sent to the RM. After the AM has pulled the 
> containers from the RM then the RM will inform the NM about it and the NM can 
> remove the completed container from the new list. Upon re-register with the 
> RM (after RM restart) the NM should send the entire list of completed 
> containers to the RM along with any other containers that completed while the 
> RM was dead. This ensures that the RM can inform the AM's about all completed 
> containers. Some container completions may be reported more than once since 
> the AM may have pulled the container but the RM may die before notifying the 
> NM about the pull.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1372) Ensure all completed containers are reported to the AMs across RM restart

2014-09-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141418#comment-14141418
 ] 

Hadoop QA commented on YARN-1372:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670098/YARN-1372.008.patch
  against trunk revision aa1052c.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5051//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5051//artifact/PreCommit-HADOOP-Build-patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5051//console

This message is automatically generated.

> Ensure all completed containers are reported to the AMs across RM restart
> -
>
> Key: YARN-1372
> URL: https://issues.apache.org/jira/browse/YARN-1372
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Anubhav Dhoot
> Attachments: YARN-1372.001.patch, YARN-1372.001.patch, 
> YARN-1372.002_NMHandlesCompletedApp.patch, 
> YARN-1372.002_RMHandlesCompletedApp.patch, 
> YARN-1372.002_RMHandlesCompletedApp.patch, YARN-1372.003.patch, 
> YARN-1372.004.patch, YARN-1372.005.patch, YARN-1372.005.patch, 
> YARN-1372.006.patch, YARN-1372.007.patch, YARN-1372.008.patch, 
> YARN-1372.009.patch, YARN-1372.009.patch, YARN-1372.prelim.patch, 
> YARN-1372.prelim2.patch
>
>
> Currently the NM informs the RM about completed containers and then removes 
> those containers from the RM notification list. The RM passes on that 
> completed container information to the AM and the AM pulls this data. If the 
> RM dies before the AM pulls this data then the AM may not be able to get this 
> information again. To fix this, NM should maintain a separate list of such 
> completed container notifications sent to the RM. After the AM has pulled the 
> containers from the RM then the RM will inform the NM about it and the NM can 
> remove the completed container from the new list. Upon re-register with the 
> RM (after RM restart) the NM should send the entire list of completed 
> containers to the RM along with any other containers that completed while the 
> RM was dead. This ensures that the RM can inform the AM's about all completed 
> containers. Some container completions may be reported more than once since 
> the AM may have pulled the container but the RM may die before notifying the 
> NM about the pull.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1372) Ensure all completed containers are reported to the AMs across RM restart

2014-09-19 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-1372:

Attachment: YARN-1372.009.patch

Rebased

> Ensure all completed containers are reported to the AMs across RM restart
> -
>
> Key: YARN-1372
> URL: https://issues.apache.org/jira/browse/YARN-1372
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Anubhav Dhoot
> Attachments: YARN-1372.001.patch, YARN-1372.001.patch, 
> YARN-1372.002_NMHandlesCompletedApp.patch, 
> YARN-1372.002_RMHandlesCompletedApp.patch, 
> YARN-1372.002_RMHandlesCompletedApp.patch, YARN-1372.003.patch, 
> YARN-1372.004.patch, YARN-1372.005.patch, YARN-1372.005.patch, 
> YARN-1372.006.patch, YARN-1372.007.patch, YARN-1372.008.patch, 
> YARN-1372.009.patch, YARN-1372.009.patch, YARN-1372.prelim.patch, 
> YARN-1372.prelim2.patch
>
>
> Currently the NM informs the RM about completed containers and then removes 
> those containers from the RM notification list. The RM passes on that 
> completed container information to the AM and the AM pulls this data. If the 
> RM dies before the AM pulls this data then the AM may not be able to get this 
> information again. To fix this, NM should maintain a separate list of such 
> completed container notifications sent to the RM. After the AM has pulled the 
> containers from the RM then the RM will inform the NM about it and the NM can 
> remove the completed container from the new list. Upon re-register with the 
> RM (after RM restart) the NM should send the entire list of completed 
> containers to the RM along with any other containers that completed while the 
> RM was dead. This ensures that the RM can inform the AM's about all completed 
> containers. Some container completions may be reported more than once since 
> the AM may have pulled the container but the RM may die before notifying the 
> NM about the pull.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1372) Ensure all completed containers are reported to the AMs across RM restart

2014-09-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141405#comment-14141405
 ] 

Hadoop QA commented on YARN-1372:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670113/YARN-1372.009.patch
  against trunk revision 9e35571.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5052//console

This message is automatically generated.

> Ensure all completed containers are reported to the AMs across RM restart
> -
>
> Key: YARN-1372
> URL: https://issues.apache.org/jira/browse/YARN-1372
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Anubhav Dhoot
> Attachments: YARN-1372.001.patch, YARN-1372.001.patch, 
> YARN-1372.002_NMHandlesCompletedApp.patch, 
> YARN-1372.002_RMHandlesCompletedApp.patch, 
> YARN-1372.002_RMHandlesCompletedApp.patch, YARN-1372.003.patch, 
> YARN-1372.004.patch, YARN-1372.005.patch, YARN-1372.005.patch, 
> YARN-1372.006.patch, YARN-1372.007.patch, YARN-1372.008.patch, 
> YARN-1372.009.patch, YARN-1372.prelim.patch, YARN-1372.prelim2.patch
>
>
> Currently the NM informs the RM about completed containers and then removes 
> those containers from the RM notification list. The RM passes on that 
> completed container information to the AM and the AM pulls this data. If the 
> RM dies before the AM pulls this data then the AM may not be able to get this 
> information again. To fix this, NM should maintain a separate list of such 
> completed container notifications sent to the RM. After the AM has pulled the 
> containers from the RM then the RM will inform the NM about it and the NM can 
> remove the completed container from the new list. Upon re-register with the 
> RM (after RM restart) the NM should send the entire list of completed 
> containers to the RM along with any other containers that completed while the 
> RM was dead. This ensures that the RM can inform the AM's about all completed 
> containers. Some container completions may be reported more than once since 
> the AM may have pulled the container but the RM may die before notifying the 
> NM about the pull.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1372) Ensure all completed containers are reported to the AMs across RM restart

2014-09-19 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-1372:

Attachment: YARN-1372.009.patch

Redo upload patch to kick jenkins. Removed unnecessary default assignment

> Ensure all completed containers are reported to the AMs across RM restart
> -
>
> Key: YARN-1372
> URL: https://issues.apache.org/jira/browse/YARN-1372
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Anubhav Dhoot
> Attachments: YARN-1372.001.patch, YARN-1372.001.patch, 
> YARN-1372.002_NMHandlesCompletedApp.patch, 
> YARN-1372.002_RMHandlesCompletedApp.patch, 
> YARN-1372.002_RMHandlesCompletedApp.patch, YARN-1372.003.patch, 
> YARN-1372.004.patch, YARN-1372.005.patch, YARN-1372.005.patch, 
> YARN-1372.006.patch, YARN-1372.007.patch, YARN-1372.008.patch, 
> YARN-1372.009.patch, YARN-1372.prelim.patch, YARN-1372.prelim2.patch
>
>
> Currently the NM informs the RM about completed containers and then removes 
> those containers from the RM notification list. The RM passes on that 
> completed container information to the AM and the AM pulls this data. If the 
> RM dies before the AM pulls this data then the AM may not be able to get this 
> information again. To fix this, NM should maintain a separate list of such 
> completed container notifications sent to the RM. After the AM has pulled the 
> containers from the RM then the RM will inform the NM about it and the NM can 
> remove the completed container from the new list. Upon re-register with the 
> RM (after RM restart) the NM should send the entire list of completed 
> containers to the RM along with any other containers that completed while the 
> RM was dead. This ensures that the RM can inform the AM's about all completed 
> containers. Some container completions may be reported more than once since 
> the AM may have pulled the container but the RM may die before notifying the 
> NM about the pull.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels

2014-09-19 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141369#comment-14141369
 ] 

Craig Welch commented on YARN-2496:
---

CSQueueUtils - just removing a line, should revert

> [YARN-796] Changes for capacity scheduler to support allocate resource 
> respect labels
> -
>
> Key: YARN-2496
> URL: https://issues.apache.org/jira/browse/YARN-2496
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, 
> YARN-2496.patch
>
>
> This JIRA Includes:
> - Add/parse labels option to {{capacity-scheduler.xml}} similar to other 
> options of queue like capacity/maximum-capacity, etc.
> - Include a "default-label-expression" option in queue config, if an app 
> doesn't specify label-expression, "default-label-expression" of queue will be 
> used.
> - Check if labels can be accessed by the queue when submit an app with 
> labels-expression to queue or update ResourceRequest with label-expression
> - Check labels on NM when trying to allocate ResourceRequest on the NM with 
> label-expression
> - Respect  labels when calculate headroom/user-limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels

2014-09-19 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141367#comment-14141367
 ] 

Craig Welch commented on YARN-2496:
---

On the parent/leaf refactor to share AbstractCSQueue - a great idea, thought 
about it myself when seeing the duplication, but I think that doing it while 
making changes like adding node labels adds confusion and makes it harder to 
see functional changes, I think it should have been done in isolation at some 
point (where no other changes were occurring).  I don’t think you should change 
course on it now (I'm not suggesting any changes to what you have... I think it 
would be more risky than not at this point), just a thought for future cases 
like this.

> [YARN-796] Changes for capacity scheduler to support allocate resource 
> respect labels
> -
>
> Key: YARN-2496
> URL: https://issues.apache.org/jira/browse/YARN-2496
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, 
> YARN-2496.patch
>
>
> This JIRA Includes:
> - Add/parse labels option to {{capacity-scheduler.xml}} similar to other 
> options of queue like capacity/maximum-capacity, etc.
> - Include a "default-label-expression" option in queue config, if an app 
> doesn't specify label-expression, "default-label-expression" of queue will be 
> used.
> - Check if labels can be accessed by the queue when submit an app with 
> labels-expression to queue or update ResourceRequest with label-expression
> - Check labels on NM when trying to allocate ResourceRequest on the NM with 
> label-expression
> - Respect  labels when calculate headroom/user-limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-09-19 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141363#comment-14141363
 ] 

Allen Wittenauer commented on YARN-913:
---

* I have some concern around 'naked' zookeeper.* config options.  If another 
part of Hadoop uses ZK, would they be expected to use the same ZK options?  
e.g., what if I want a different ZK setup for a YARN registrar vs. the NN?  

* Is there any risk about a user writing to a shared ZK with the RM? i.e., if a 
user kills the ZK used for app registry through some action, what happens to 
the RM and other user's bits that are running?

 * Why doesn't the hostname component allow for FQDNs? 

* Are we prepared for more backlash when another component requires working 
DNS? :)

* Is ZK actually the right thing to use here?
{code}
+Zookeeper has a default limit of 1MB/node. If all endpoints of a service or
+component are stored in JSON attached to that node, then there is a total limit
+of 1MB of all endpoint registration data.
{code}

If we are reducing certain fields out of fear of hitting this limit, this seems 
to indicate that ZK is a bad fit and/or we are using ZK incorrectly.  It could 
be argued that ZK should contain a pointer to the actual data stored on HDFS.  
This allows for future growth without having to worry about blowing ZK out.

> Add a way to register long-lived services in a YARN cluster
> ---
>
> Key: YARN-913
> URL: https://issues.apache.org/jira/browse/YARN-913
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Affects Versions: 2.5.0, 2.4.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
> 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
> YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
> YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
> YARN-913-007.patch, YARN-913-008.patch, yarnregistry.pdf, yarnregistry.tla
>
>
> In a YARN cluster you can't predict where services will come up -or on what 
> ports. The services need to work those things out as they come up and then 
> publish them somewhere.
> Applications need to be able to find the service instance they are to bond to 
> -and not any others in the cluster.
> Some kind of service registry -in the RM, in ZK, could do this. If the RM 
> held the write access to the ZK nodes, it would be more secure than having 
> apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels

2014-09-19 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141355#comment-14141355
 ] 

Craig Welch commented on YARN-2496:
---

I'm also concerned about this:

Resource maxCapacityConsiderLabel =
labelManager == null ? clusterResource : labelManager.getQueueResource(
queueName, labels, clusterResource);

The "labels" are the labels for the queue, but the resource requests coming 
from the application can be a subset of that, no?  So if application "a" is 
running on a queue with lables a and b, but it has a label expression of only 
a, which it is using for resource requests, it's going to get a headroom based 
on nodes with both labels a and b, but in fact it only has a "real" headroom 
for nodes with label "a"

> [YARN-796] Changes for capacity scheduler to support allocate resource 
> respect labels
> -
>
> Key: YARN-2496
> URL: https://issues.apache.org/jira/browse/YARN-2496
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, 
> YARN-2496.patch
>
>
> This JIRA Includes:
> - Add/parse labels option to {{capacity-scheduler.xml}} similar to other 
> options of queue like capacity/maximum-capacity, etc.
> - Include a "default-label-expression" option in queue config, if an app 
> doesn't specify label-expression, "default-label-expression" of queue will be 
> used.
> - Check if labels can be accessed by the queue when submit an app with 
> labels-expression to queue or update ResourceRequest with label-expression
> - Check labels on NM when trying to allocate ResourceRequest on the NM with 
> label-expression
> - Respect  labels when calculate headroom/user-limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels

2014-09-19 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141342#comment-14141342
 ] 

Craig Welch commented on YARN-2496:
---

Let's say user UserU is running two jobs, JobA and JobB, in queue QueueAB, 
which has both LabelA and LabelB.  JobA has label LabelA, JobB has LabelB.  
It's a two node cluster, Node1 and Node2.  Node1 has LabelA, Node2 has LabelB

Let's say the user has access to 100% of the cluster (just the one queue, etc).

Let's say that JobA is using 4G of Ram and JobB is also using 4G.  Let's say 
each node has 6G.

Headroom Calculation for JobA:

userConsumed = 8G
maxCapacityConsiderLabelA = 6G (Node1 only)
headroom = -2G (assume it will normalize to 0G)

However, the user should still be able to use the remaining 2G for JobA, as 
they are only using 4 of the 6G available to that label. 

The issue I see is userConsumed, as maxCapacityConsiderLabel considers the 
label, but userConsumed does not, it should only be "userConsumedForLabel", if 
it is, JobA would see 2G as it should (as the consumed for LabelA is only 4G.)  
The problem, I think, is in subtracting a cross-label value from a per-label 
value.

> [YARN-796] Changes for capacity scheduler to support allocate resource 
> respect labels
> -
>
> Key: YARN-2496
> URL: https://issues.apache.org/jira/browse/YARN-2496
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, 
> YARN-2496.patch
>
>
> This JIRA Includes:
> - Add/parse labels option to {{capacity-scheduler.xml}} similar to other 
> options of queue like capacity/maximum-capacity, etc.
> - Include a "default-label-expression" option in queue config, if an app 
> doesn't specify label-expression, "default-label-expression" of queue will be 
> used.
> - Check if labels can be accessed by the queue when submit an app with 
> labels-expression to queue or update ResourceRequest with label-expression
> - Check labels on NM when trying to allocate ResourceRequest on the NM with 
> label-expression
> - Respect  labels when calculate headroom/user-limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1372) Ensure all completed containers are reported to the AMs across RM restart

2014-09-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141337#comment-14141337
 ] 

Hadoop QA commented on YARN-1372:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670090/YARN-1372.007.patch
  against trunk revision 951847b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1266 javac 
compiler warnings (more than the trunk's current 1265 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common:

  org.apache.hadoop.mapreduce.v2.app.TestRecovery

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5050//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5050//artifact/PreCommit-HADOOP-Build-patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5050//console

This message is automatically generated.

> Ensure all completed containers are reported to the AMs across RM restart
> -
>
> Key: YARN-1372
> URL: https://issues.apache.org/jira/browse/YARN-1372
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Anubhav Dhoot
> Attachments: YARN-1372.001.patch, YARN-1372.001.patch, 
> YARN-1372.002_NMHandlesCompletedApp.patch, 
> YARN-1372.002_RMHandlesCompletedApp.patch, 
> YARN-1372.002_RMHandlesCompletedApp.patch, YARN-1372.003.patch, 
> YARN-1372.004.patch, YARN-1372.005.patch, YARN-1372.005.patch, 
> YARN-1372.006.patch, YARN-1372.007.patch, YARN-1372.008.patch, 
> YARN-1372.prelim.patch, YARN-1372.prelim2.patch
>
>
> Currently the NM informs the RM about completed containers and then removes 
> those containers from the RM notification list. The RM passes on that 
> completed container information to the AM and the AM pulls this data. If the 
> RM dies before the AM pulls this data then the AM may not be able to get this 
> information again. To fix this, NM should maintain a separate list of such 
> completed container notifications sent to the RM. After the AM has pulled the 
> containers from the RM then the RM will inform the NM about it and the NM can 
> remove the completed container from the new list. Upon re-register with the 
> RM (after RM restart) the NM should send the entire list of completed 
> containers to the RM along with any other containers that completed while the 
> RM was dead. This ensures that the RM can inform the AM's about all completed 
> containers. Some container completions may be reported more than once since 
> the AM may have pulled the container but the RM may die before notifying the 
> NM about the pull.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2460) Remove obsolete entries from yarn-default.xml

2014-09-19 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141302#comment-14141302
 ] 

Allen Wittenauer commented on YARN-2460:


+1 lgtm.

Committed to trunk and branch-2.

Thanks!


> Remove obsolete entries from yarn-default.xml
> -
>
> Key: YARN-2460
> URL: https://issues.apache.org/jira/browse/YARN-2460
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.5.0
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Minor
>  Labels: newbie
> Fix For: 2.6.0
>
> Attachments: YARN-2460-01.patch, YARN-2460-02.patch
>
>
> The following properties are defined in yarn-default.xml, but do not exist in 
> YarnConfiguration.
>   mapreduce.job.hdfs-servers
>   mapreduce.job.jar
>   yarn.ipc.exception.factory.class
>   yarn.ipc.serializer.type
>   yarn.nodemanager.aux-services.mapreduce_shuffle.class
>   yarn.nodemanager.hostname
>   yarn.nodemanager.resourcemanager.connect.retry_interval.secs
>   yarn.nodemanager.resourcemanager.connect.wait.secs
>   yarn.resourcemanager.amliveliness-monitor.interval-ms
>   yarn.resourcemanager.application-tokens.master-key-rolling-interval-secs
>   yarn.resourcemanager.container.liveness-monitor.interval-ms
>   yarn.resourcemanager.nm.liveness-monitor.interval-ms
>   yarn.timeline-service.hostname
>   yarn.timeline-service.http-authentication.simple.anonymous.allowed
>   yarn.timeline-service.http-authentication.type
> Presumably, the mapreduce.* properties are okay.  Similarly, the 
> yarn.timeline-service.* properties are for the future TimelineService.  
> However, the rest are likely fully deprecated.
> Submitting bug for comment/feedback about which other properties should be 
> kept in yarn-default.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1372) Ensure all completed containers are reported to the AMs across RM restart

2014-09-19 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-1372:

Attachment: YARN-1372.008.patch

After offline discussion with Jian, remove containers from NMStateStore only 
when its removed from context. That allows AM time to get notification

> Ensure all completed containers are reported to the AMs across RM restart
> -
>
> Key: YARN-1372
> URL: https://issues.apache.org/jira/browse/YARN-1372
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Anubhav Dhoot
> Attachments: YARN-1372.001.patch, YARN-1372.001.patch, 
> YARN-1372.002_NMHandlesCompletedApp.patch, 
> YARN-1372.002_RMHandlesCompletedApp.patch, 
> YARN-1372.002_RMHandlesCompletedApp.patch, YARN-1372.003.patch, 
> YARN-1372.004.patch, YARN-1372.005.patch, YARN-1372.005.patch, 
> YARN-1372.006.patch, YARN-1372.007.patch, YARN-1372.008.patch, 
> YARN-1372.prelim.patch, YARN-1372.prelim2.patch
>
>
> Currently the NM informs the RM about completed containers and then removes 
> those containers from the RM notification list. The RM passes on that 
> completed container information to the AM and the AM pulls this data. If the 
> RM dies before the AM pulls this data then the AM may not be able to get this 
> information again. To fix this, NM should maintain a separate list of such 
> completed container notifications sent to the RM. After the AM has pulled the 
> containers from the RM then the RM will inform the NM about it and the NM can 
> remove the completed container from the new list. Upon re-register with the 
> RM (after RM restart) the NM should send the entire list of completed 
> containers to the RM along with any other containers that completed while the 
> RM was dead. This ensures that the RM can inform the AM's about all completed 
> containers. Some container completions may be reported more than once since 
> the AM may have pulled the container but the RM may die before notifying the 
> NM about the pull.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1372) Ensure all completed containers are reported to the AMs across RM restart

2014-09-19 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-1372:

Attachment: YARN-1372.007.patch

> Ensure all completed containers are reported to the AMs across RM restart
> -
>
> Key: YARN-1372
> URL: https://issues.apache.org/jira/browse/YARN-1372
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Anubhav Dhoot
> Attachments: YARN-1372.001.patch, YARN-1372.001.patch, 
> YARN-1372.002_NMHandlesCompletedApp.patch, 
> YARN-1372.002_RMHandlesCompletedApp.patch, 
> YARN-1372.002_RMHandlesCompletedApp.patch, YARN-1372.003.patch, 
> YARN-1372.004.patch, YARN-1372.005.patch, YARN-1372.005.patch, 
> YARN-1372.006.patch, YARN-1372.007.patch, YARN-1372.prelim.patch, 
> YARN-1372.prelim2.patch
>
>
> Currently the NM informs the RM about completed containers and then removes 
> those containers from the RM notification list. The RM passes on that 
> completed container information to the AM and the AM pulls this data. If the 
> RM dies before the AM pulls this data then the AM may not be able to get this 
> information again. To fix this, NM should maintain a separate list of such 
> completed container notifications sent to the RM. After the AM has pulled the 
> containers from the RM then the RM will inform the NM about it and the NM can 
> remove the completed container from the new list. Upon re-register with the 
> RM (after RM restart) the NM should send the entire list of completed 
> containers to the RM along with any other containers that completed while the 
> RM was dead. This ensures that the RM can inform the AM's about all completed 
> containers. Some container completions may be reported more than once since 
> the AM may have pulled the container but the RM may die before notifying the 
> NM about the pull.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

2014-09-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141251#comment-14141251
 ] 

Hadoop QA commented on YARN-2198:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12670085/YARN-2198.trunk.crlf.6.patch
  against trunk revision 951847b.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5049//console

This message is automatically generated.

> Remove the need to run NodeManager as privileged account for Windows Secure 
> Container Executor
> --
>
> Key: YARN-2198
> URL: https://issues.apache.org/jira/browse/YARN-2198
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>  Labels: security, windows
> Attachments: YARN-2198.1.patch, YARN-2198.2.patch, YARN-2198.3.patch, 
> YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, 
> YARN-2198.delta.7.patch, YARN-2198.separation.patch, YARN-2198.trunk.4.patch, 
> YARN-2198.trunk.5.patch, YARN-2198.trunk.6.patch, YARN-2198.trunk.crlf.6.patch
>
>
> YARN-1972 introduces a Secure Windows Container Executor. However this 
> executor requires a the process launching the container to be LocalSystem or 
> a member of the a local Administrators group. Since the process in question 
> is the NodeManager, the requirement translates to the entire NM to run as a 
> privileged account, a very large surface area to review and protect.
> This proposal is to move the privileged operations into a dedicated NT 
> service. The NM can run as a low privilege account and communicate with the 
> privileged NT service when it needs to launch a container. This would reduce 
> the surface exposed to the high privileges. 
> There has to exist a secure, authenticated and authorized channel of 
> communication between the NM and the privileged NT service. Possible 
> alternatives are a new TCP endpoint, Java RPC etc. My proposal though would 
> be to use Windows LPC (Local Procedure Calls), which is a Windows platform 
> specific inter-process communication channel that satisfies all requirements 
> and is easy to deploy. The privileged NT service would register and listen on 
> an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop 
> with libwinutils which would host the LPC client code. The client would 
> connect to the LPC port (NtConnectPort) and send a message requesting a 
> container launch (NtRequestWaitReplyPort). LPC provides authentication and 
> the privileged NT service can use authorization API (AuthZ) to validate the 
> caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2080) Admission Control: Integrate Reservation subsystem with ResourceManager

2014-09-19 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141249#comment-14141249
 ] 

Subru Krishnan commented on YARN-2080:
--

[~vinodkv], as discussed I have opened YARN-2575 for tracking separate ACLs for 
reservation APIs.

> Admission Control: Integrate Reservation subsystem with ResourceManager
> ---
>
> Key: YARN-2080
> URL: https://issues.apache.org/jira/browse/YARN-2080
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Attachments: YARN-2080.patch, YARN-2080.patch, YARN-2080.patch, 
> YARN-2080.patch, YARN-2080.patch, YARN-2080.patch, YARN-2080.patch
>
>
> This JIRA tracks the integration of Reservation subsystem data structures 
> introduced in YARN-1709 with the YARN RM. This is essentially end2end wiring 
> of YARN-1051.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

2014-09-19 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated YARN-2198:
---
Attachment: YARN-2198.trunk.crlf.6.patch

This is trunk.6.patch with CRLF in .vcxproj and .sln hunks, just to see ij Mr. 
Jenkins is happy with it.

> Remove the need to run NodeManager as privileged account for Windows Secure 
> Container Executor
> --
>
> Key: YARN-2198
> URL: https://issues.apache.org/jira/browse/YARN-2198
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>  Labels: security, windows
> Attachments: YARN-2198.1.patch, YARN-2198.2.patch, YARN-2198.3.patch, 
> YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, 
> YARN-2198.delta.7.patch, YARN-2198.separation.patch, YARN-2198.trunk.4.patch, 
> YARN-2198.trunk.5.patch, YARN-2198.trunk.6.patch, YARN-2198.trunk.crlf.6.patch
>
>
> YARN-1972 introduces a Secure Windows Container Executor. However this 
> executor requires a the process launching the container to be LocalSystem or 
> a member of the a local Administrators group. Since the process in question 
> is the NodeManager, the requirement translates to the entire NM to run as a 
> privileged account, a very large surface area to review and protect.
> This proposal is to move the privileged operations into a dedicated NT 
> service. The NM can run as a low privilege account and communicate with the 
> privileged NT service when it needs to launch a container. This would reduce 
> the surface exposed to the high privileges. 
> There has to exist a secure, authenticated and authorized channel of 
> communication between the NM and the privileged NT service. Possible 
> alternatives are a new TCP endpoint, Java RPC etc. My proposal though would 
> be to use Windows LPC (Local Procedure Calls), which is a Windows platform 
> specific inter-process communication channel that satisfies all requirements 
> and is easy to deploy. The privileged NT service would register and listen on 
> an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop 
> with libwinutils which would host the LPC client code. The client would 
> connect to the LPC port (NtConnectPort) and send a message requesting a 
> container launch (NtRequestWaitReplyPort). LPC provides authentication and 
> the privileged NT service can use authorization API (AuthZ) to validate the 
> caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2460) Remove obsolete entries from yarn-default.xml

2014-09-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141180#comment-14141180
 ] 

Hadoop QA commented on YARN-2460:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670063/YARN-2460-02.patch
  against trunk revision 9f03a7c.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs 
hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5048//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5048//console

This message is automatically generated.

> Remove obsolete entries from yarn-default.xml
> -
>
> Key: YARN-2460
> URL: https://issues.apache.org/jira/browse/YARN-2460
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.5.0
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Minor
>  Labels: newbie
> Attachments: YARN-2460-01.patch, YARN-2460-02.patch
>
>
> The following properties are defined in yarn-default.xml, but do not exist in 
> YarnConfiguration.
>   mapreduce.job.hdfs-servers
>   mapreduce.job.jar
>   yarn.ipc.exception.factory.class
>   yarn.ipc.serializer.type
>   yarn.nodemanager.aux-services.mapreduce_shuffle.class
>   yarn.nodemanager.hostname
>   yarn.nodemanager.resourcemanager.connect.retry_interval.secs
>   yarn.nodemanager.resourcemanager.connect.wait.secs
>   yarn.resourcemanager.amliveliness-monitor.interval-ms
>   yarn.resourcemanager.application-tokens.master-key-rolling-interval-secs
>   yarn.resourcemanager.container.liveness-monitor.interval-ms
>   yarn.resourcemanager.nm.liveness-monitor.interval-ms
>   yarn.timeline-service.hostname
>   yarn.timeline-service.http-authentication.simple.anonymous.allowed
>   yarn.timeline-service.http-authentication.type
> Presumably, the mapreduce.* properties are okay.  Similarly, the 
> yarn.timeline-service.* properties are for the future TimelineService.  
> However, the rest are likely fully deprecated.
> Submitting bug for comment/feedback about which other properties should be 
> kept in yarn-default.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2576) Prepare yarn-1051 branch for merging with trunk

2014-09-19 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-2576:


 Summary: Prepare yarn-1051 branch for merging with trunk
 Key: YARN-2576
 URL: https://issues.apache.org/jira/browse/YARN-2576
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Subru Krishnan
Assignee: Subru Krishnan


This JIRA is to track the changes required to ensure branch yarn-1051 is ready 
to be merged with trunk. This includes fixing any compilation issues, findbug 
and/or javadoc warning, test cases failures, etc if any.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-09-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141148#comment-14141148
 ] 

Hadoop QA commented on YARN-913:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670048/YARN-913-008.patch
  against trunk revision 25fd69a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 34 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1267 javac 
compiler warnings (more than the trunk's current 1266 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-minikdc 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5047//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5047//artifact/PreCommit-HADOOP-Build-patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5047//console

This message is automatically generated.

> Add a way to register long-lived services in a YARN cluster
> ---
>
> Key: YARN-913
> URL: https://issues.apache.org/jira/browse/YARN-913
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Affects Versions: 2.5.0, 2.4.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
> 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
> YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
> YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
> YARN-913-007.patch, YARN-913-008.patch, yarnregistry.pdf, yarnregistry.tla
>
>
> In a YARN cluster you can't predict where services will come up -or on what 
> ports. The services need to work those things out as they come up and then 
> publish them somewhere.
> Applications need to be able to find the service instance they are to bond to 
> -and not any others in the cluster.
> Some kind of service registry -in the RM, in ZK, could do this. If the RM 
> held the write access to the ZK nodes, it would be more secure than having 
> apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2575) Consider creating separate ACLs for Reservation create/update/delete ops

2014-09-19 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-2575:


 Summary: Consider creating separate ACLs for Reservation 
create/update/delete ops
 Key: YARN-2575
 URL: https://issues.apache.org/jira/browse/YARN-2575
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Subru Krishnan
Assignee: Subru Krishnan


YARN-1051 introduces the ReservationSystem and in the current implementation 
anyone who can submit applications can also submit reservations. This JIRA is 
to evaluate creating separate ACLs for Reservation create/update/delete ops.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2574) Add support for FairScheduler to the ReservationSystem

2014-09-19 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-2574:


 Summary: Add support for FairScheduler to the ReservationSystem
 Key: YARN-2574
 URL: https://issues.apache.org/jira/browse/YARN-2574
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: fairscheduler
Reporter: Subru Krishnan


YARN-1051 introduces the ReservationSystem and the current implementation is 
based on CapacityScheduler. This JIRA proposes adding support for FairScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1990) Track time-to-allocation for different size containers

2014-09-19 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-1990:
-
Parent Issue: YARN-2572  (was: YARN-1051)

> Track time-to-allocation for different size containers 
> ---
>
> Key: YARN-1990
> URL: https://issues.apache.org/jira/browse/YARN-1990
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>
> Allocation of Large Containers are notoriously problematic, as smaller 
> containers can more easily grab resources. 
> The proposal for this JIRA is to maintain a map of container sizes, and 
> time-to-allocation, that can be used as:
> * general insight on cluster behavior, 
> * to inform the reservation-system, and allows us to "account for delays" in 
> allocation, so that the user reservation is respected regardless the size of 
> containers requested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2573) Integrate ReservationSystem with the RM failover mechanism

2014-09-19 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-2573:


 Summary: Integrate ReservationSystem with the RM failover mechanism
 Key: YARN-2573
 URL: https://issues.apache.org/jira/browse/YARN-2573
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Subru Krishnan
Assignee: Subru Krishnan


YARN-1051 introduces the ReservationSystem and the current implementation is 
completely in-memory based. YARN-149 brings in the notion of RM HA with a 
highly available state store. This JIRA proposes persisting the Plan into the 
RMStateStore and recovering it post RM failover



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2572) Enhancements to the ReservationSytem/Planner

2014-09-19 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan reassigned YARN-2572:


Assignee: Subru Krishnan

> Enhancements to the ReservationSytem/Planner
> 
>
> Key: YARN-2572
> URL: https://issues.apache.org/jira/browse/YARN-2572
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
>
> YARN-1051 introduces a ReservationSytem/Planner that enables the YARN RM to 
> handle time expilicitly, i.e. users can now "reserve" capacity ahead of time 
> which is predictably allocated to them. This is an umbrella JIRA to enhance 
> the reservation system by integrating with FairScheduler, RM failover 
> mechanism, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2572) Enhancements to the ReservationSytem/Planner

2014-09-19 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-2572:


 Summary: Enhancements to the ReservationSytem/Planner
 Key: YARN-2572
 URL: https://issues.apache.org/jira/browse/YARN-2572
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, fairscheduler, resourcemanager
Reporter: Subru Krishnan


YARN-1051 introduces a ReservationSytem/Planner that enables the YARN RM to 
handle time expilicitly, i.e. users can now "reserve" capacity ahead of time 
which is predictably allocated to them. This is an umbrella JIRA to enhance the 
reservation system by integrating with FairScheduler, RM failover mechanism, 
etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2460) Remove obsolete entries from yarn-default.xml

2014-09-19 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-2460:
-
Attachment: YARN-2460-02.patch

Adding one more place where

  yarn.resourcemanager.application-tokens.master-key-rolling-interval-secs

needs to be replaced with

  yarn.resourcemanager.am-rm-tokens.master-key-rolling-interval-secs


> Remove obsolete entries from yarn-default.xml
> -
>
> Key: YARN-2460
> URL: https://issues.apache.org/jira/browse/YARN-2460
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.5.0
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Minor
>  Labels: newbie
> Attachments: YARN-2460-01.patch, YARN-2460-02.patch
>
>
> The following properties are defined in yarn-default.xml, but do not exist in 
> YarnConfiguration.
>   mapreduce.job.hdfs-servers
>   mapreduce.job.jar
>   yarn.ipc.exception.factory.class
>   yarn.ipc.serializer.type
>   yarn.nodemanager.aux-services.mapreduce_shuffle.class
>   yarn.nodemanager.hostname
>   yarn.nodemanager.resourcemanager.connect.retry_interval.secs
>   yarn.nodemanager.resourcemanager.connect.wait.secs
>   yarn.resourcemanager.amliveliness-monitor.interval-ms
>   yarn.resourcemanager.application-tokens.master-key-rolling-interval-secs
>   yarn.resourcemanager.container.liveness-monitor.interval-ms
>   yarn.resourcemanager.nm.liveness-monitor.interval-ms
>   yarn.timeline-service.hostname
>   yarn.timeline-service.http-authentication.simple.anonymous.allowed
>   yarn.timeline-service.http-authentication.type
> Presumably, the mapreduce.* properties are okay.  Similarly, the 
> yarn.timeline-service.* properties are for the future TimelineService.  
> However, the rest are likely fully deprecated.
> Submitting bug for comment/feedback about which other properties should be 
> kept in yarn-default.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1372) Ensure all completed containers are reported to the AMs across RM restart

2014-09-19 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141070#comment-14141070
 ] 

Anubhav Dhoot commented on YARN-1372:
-

Addressed everything
Regarding removeVeryOldStoppedContainersFromCache, i have reverted the changes. 
In case there are no acks and the application is  long running, we still want 
to remove the state from the store after duration-to-track-stopped-containers. 
We may leave the state in the context. But we dont want to do the converse - 
removing from context too early if duration-to-track-stopped-containers is too 
low.

> Ensure all completed containers are reported to the AMs across RM restart
> -
>
> Key: YARN-1372
> URL: https://issues.apache.org/jira/browse/YARN-1372
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Anubhav Dhoot
> Attachments: YARN-1372.001.patch, YARN-1372.001.patch, 
> YARN-1372.002_NMHandlesCompletedApp.patch, 
> YARN-1372.002_RMHandlesCompletedApp.patch, 
> YARN-1372.002_RMHandlesCompletedApp.patch, YARN-1372.003.patch, 
> YARN-1372.004.patch, YARN-1372.005.patch, YARN-1372.005.patch, 
> YARN-1372.006.patch, YARN-1372.prelim.patch, YARN-1372.prelim2.patch
>
>
> Currently the NM informs the RM about completed containers and then removes 
> those containers from the RM notification list. The RM passes on that 
> completed container information to the AM and the AM pulls this data. If the 
> RM dies before the AM pulls this data then the AM may not be able to get this 
> information again. To fix this, NM should maintain a separate list of such 
> completed container notifications sent to the RM. After the AM has pulled the 
> containers from the RM then the RM will inform the NM about it and the NM can 
> remove the completed container from the new list. Upon re-register with the 
> RM (after RM restart) the NM should send the entire list of completed 
> containers to the RM along with any other containers that completed while the 
> RM was dead. This ensures that the RM can inform the AM's about all completed 
> containers. Some container completions may be reported more than once since 
> the AM may have pulled the container but the RM may die before notifying the 
> NM about the pull.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2460) Remove obsolete entries from yarn-default.xml

2014-09-19 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141053#comment-14141053
 ] 

Ray Chiang commented on YARN-2460:
--

Okay, I'll fix that.  Thanks for finding it.

> Remove obsolete entries from yarn-default.xml
> -
>
> Key: YARN-2460
> URL: https://issues.apache.org/jira/browse/YARN-2460
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.5.0
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Minor
>  Labels: newbie
> Attachments: YARN-2460-01.patch
>
>
> The following properties are defined in yarn-default.xml, but do not exist in 
> YarnConfiguration.
>   mapreduce.job.hdfs-servers
>   mapreduce.job.jar
>   yarn.ipc.exception.factory.class
>   yarn.ipc.serializer.type
>   yarn.nodemanager.aux-services.mapreduce_shuffle.class
>   yarn.nodemanager.hostname
>   yarn.nodemanager.resourcemanager.connect.retry_interval.secs
>   yarn.nodemanager.resourcemanager.connect.wait.secs
>   yarn.resourcemanager.amliveliness-monitor.interval-ms
>   yarn.resourcemanager.application-tokens.master-key-rolling-interval-secs
>   yarn.resourcemanager.container.liveness-monitor.interval-ms
>   yarn.resourcemanager.nm.liveness-monitor.interval-ms
>   yarn.timeline-service.hostname
>   yarn.timeline-service.http-authentication.simple.anonymous.allowed
>   yarn.timeline-service.http-authentication.type
> Presumably, the mapreduce.* properties are okay.  Similarly, the 
> yarn.timeline-service.* properties are for the future TimelineService.  
> However, the rest are likely fully deprecated.
> Submitting bug for comment/feedback about which other properties should be 
> kept in yarn-default.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2565) RM shouldn't use the old RMApplicationHistoryWriter unless explicitly setting FileSystemApplicationHistoryStore

2014-09-19 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141024#comment-14141024
 ] 

Jian He commented on YARN-2565:
---

looks good, committing

> RM shouldn't use the old RMApplicationHistoryWriter unless explicitly setting 
> FileSystemApplicationHistoryStore
> ---
>
> Key: YARN-2565
> URL: https://issues.apache.org/jira/browse/YARN-2565
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.6.0
> Environment: Secure cluster with ATS (timeline server enabled) and 
> yarn.resourcemanager.system-metrics-publisher.enabled=true
> so that RM can send Application history to Timeline Store
>Reporter: Karam Singh
>Assignee: Zhijie Shen
> Attachments: YARN-2565.1.patch, YARN-2565.2.patch, YARN-2565.3.patch
>
>
> Observed that RM fails to start in Secure mode when GenericeHistoryService is 
> enabled and ResourceManager is set to use Timeline Store



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2460) Remove obsolete entries from yarn-default.xml

2014-09-19 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141019#comment-14141019
 ] 

Allen Wittenauer commented on YARN-2460:


Actually, there is a problem so taking back my +1. :(

The patch is slightly incomplete.  It renames 
yarn.resourcemanager.application-tokens.master-key-rolling-interval-sec but 
doesn't 
rename the entries in in 
hadoop-tools/hadoop-sls/src/main/data/2jobs2min-rumen-jh.json .


> Remove obsolete entries from yarn-default.xml
> -
>
> Key: YARN-2460
> URL: https://issues.apache.org/jira/browse/YARN-2460
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.5.0
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Minor
>  Labels: newbie
> Attachments: YARN-2460-01.patch
>
>
> The following properties are defined in yarn-default.xml, but do not exist in 
> YarnConfiguration.
>   mapreduce.job.hdfs-servers
>   mapreduce.job.jar
>   yarn.ipc.exception.factory.class
>   yarn.ipc.serializer.type
>   yarn.nodemanager.aux-services.mapreduce_shuffle.class
>   yarn.nodemanager.hostname
>   yarn.nodemanager.resourcemanager.connect.retry_interval.secs
>   yarn.nodemanager.resourcemanager.connect.wait.secs
>   yarn.resourcemanager.amliveliness-monitor.interval-ms
>   yarn.resourcemanager.application-tokens.master-key-rolling-interval-secs
>   yarn.resourcemanager.container.liveness-monitor.interval-ms
>   yarn.resourcemanager.nm.liveness-monitor.interval-ms
>   yarn.timeline-service.hostname
>   yarn.timeline-service.http-authentication.simple.anonymous.allowed
>   yarn.timeline-service.http-authentication.type
> Presumably, the mapreduce.* properties are okay.  Similarly, the 
> yarn.timeline-service.* properties are for the future TimelineService.  
> However, the rest are likely fully deprecated.
> Submitting bug for comment/feedback about which other properties should be 
> kept in yarn-default.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2460) Remove obsolete entries from yarn-default.xml

2014-09-19 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141011#comment-14141011
 ] 

Allen Wittenauer commented on YARN-2460:


I quick double check seems to indicate that, yes, these properties aren't being 
used.

So +1 LGTM.  I'll commit here in a bit.

> Remove obsolete entries from yarn-default.xml
> -
>
> Key: YARN-2460
> URL: https://issues.apache.org/jira/browse/YARN-2460
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.5.0
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Minor
>  Labels: newbie
> Attachments: YARN-2460-01.patch
>
>
> The following properties are defined in yarn-default.xml, but do not exist in 
> YarnConfiguration.
>   mapreduce.job.hdfs-servers
>   mapreduce.job.jar
>   yarn.ipc.exception.factory.class
>   yarn.ipc.serializer.type
>   yarn.nodemanager.aux-services.mapreduce_shuffle.class
>   yarn.nodemanager.hostname
>   yarn.nodemanager.resourcemanager.connect.retry_interval.secs
>   yarn.nodemanager.resourcemanager.connect.wait.secs
>   yarn.resourcemanager.amliveliness-monitor.interval-ms
>   yarn.resourcemanager.application-tokens.master-key-rolling-interval-secs
>   yarn.resourcemanager.container.liveness-monitor.interval-ms
>   yarn.resourcemanager.nm.liveness-monitor.interval-ms
>   yarn.timeline-service.hostname
>   yarn.timeline-service.http-authentication.simple.anonymous.allowed
>   yarn.timeline-service.http-authentication.type
> Presumably, the mapreduce.* properties are okay.  Similarly, the 
> yarn.timeline-service.* properties are for the future TimelineService.  
> However, the rest are likely fully deprecated.
> Submitting bug for comment/feedback about which other properties should be 
> kept in yarn-default.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-09-19 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-913:

Attachment: YARN-913-008.patch

-008 patch

# fixes last lurking findbug
# includes the registry site documentation 

> Add a way to register long-lived services in a YARN cluster
> ---
>
> Key: YARN-913
> URL: https://issues.apache.org/jira/browse/YARN-913
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Affects Versions: 2.5.0, 2.4.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
> 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
> YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
> YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
> YARN-913-007.patch, YARN-913-008.patch, yarnregistry.pdf, yarnregistry.tla
>
>
> In a YARN cluster you can't predict where services will come up -or on what 
> ports. The services need to work those things out as they come up and then 
> publish them somewhere.
> Applications need to be able to find the service instance they are to bond to 
> -and not any others in the cluster.
> Some kind of service registry -in the RM, in ZK, could do this. If the RM 
> held the write access to the ZK nodes, it would be more secure than having 
> apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-09-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140918#comment-14140918
 ] 

Hadoop QA commented on YARN-913:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670013/YARN-913-007.patch
  against trunk revision bf27b9c.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 34 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1267 javac 
compiler warnings (more than the trunk's current 1266 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-minikdc 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5046//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5046//artifact/PreCommit-HADOOP-Build-patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5046//artifact/PreCommit-HADOOP-Build-patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5046//console

This message is automatically generated.

> Add a way to register long-lived services in a YARN cluster
> ---
>
> Key: YARN-913
> URL: https://issues.apache.org/jira/browse/YARN-913
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Affects Versions: 2.5.0, 2.4.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
> 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
> YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
> YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
> YARN-913-007.patch, yarnregistry.pdf, yarnregistry.tla
>
>
> In a YARN cluster you can't predict where services will come up -or on what 
> ports. The services need to work those things out as they come up and then 
> publish them somewhere.
> Applications need to be able to find the service instance they are to bond to 
> -and not any others in the cluster.
> Some kind of service registry -in the RM, in ZK, could do this. If the RM 
> held the write access to the ZK nodes, it would be more secure than having 
> apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2571) RM to support YARN registry

2014-09-19 Thread Steve Loughran (JIRA)
Steve Loughran created YARN-2571:


 Summary: RM to support YARN registry 
 Key: YARN-2571
 URL: https://issues.apache.org/jira/browse/YARN-2571
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Steve Loughran
Assignee: Steve Loughran


The RM needs to (optionally) integrate with the YARN registry:

# startup: create the /services and /users paths with system ACLs (yarn, hdfs 
principals)
# app-launch: create the user directory /users/$username with the relevant 
permissions (CRD) for them to create subnodes.
# attempt, container, app completion: remove service records with the matching 
persistence and ID



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2473) YARN never cleans up container directories from a full disk

2014-09-19 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140798#comment-14140798
 ] 

Varun Vasudev commented on YARN-2473:
-

[~jlowe], can you please review the latest patch for 
YARN-90([apache-yarn-90.4.patch|https://issues.apache.org/jira/secure/attachment/12669998/apache-yarn-90.4.patch])?
 It should handle the case you pointed out here as well. Thanks!

> YARN never cleans up container directories from a full disk
> ---
>
> Key: YARN-2473
> URL: https://issues.apache.org/jira/browse/YARN-2473
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.4.0
>Reporter: Jason Lowe
>Assignee: Varun Vasudev
>Priority: Blocker
>
> After YARN-1781 when a container ends up filling a local disk the nodemanager 
> will mark it as a bad disk and remove it from the list of good local dirs.  
> When the container eventually completes the files that filled the disk will 
> not be removed because the NM thinks the directory is bad.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-09-19 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-913:

Attachment: YARN-913-007.patch

Should fix Jenkins builds: javadocs, tests and findbugs, leaving only a javac 
somewhere.


> Add a way to register long-lived services in a YARN cluster
> ---
>
> Key: YARN-913
> URL: https://issues.apache.org/jira/browse/YARN-913
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Affects Versions: 2.5.0, 2.4.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
> 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
> YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
> YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
> YARN-913-007.patch, yarnregistry.pdf, yarnregistry.tla
>
>
> In a YARN cluster you can't predict where services will come up -or on what 
> ports. The services need to work those things out as they come up and then 
> publish them somewhere.
> Applications need to be able to find the service instance they are to bond to 
> -and not any others in the cluster.
> Some kind of service registry -in the RM, in ZK, could do this. If the RM 
> held the write access to the ZK nodes, it would be more secure than having 
> apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-09-19 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140676#comment-14140676
 ] 

Wangda Tan commented on YARN-796:
-

The failure should be irrelevant to the changes, I found it failed in a recent 
JIRA as well: 
https://issues.apache.org/jira/browse/YARN-611?focusedCommentId=14129761&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14129761.
 Filed MAPREDUCE-6098.


> Allow for (admin) labels on nodes and resource-requests
> ---
>
> Key: YARN-796
> URL: https://issues.apache.org/jira/browse/YARN-796
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.1
>Reporter: Arun C Murthy
>Assignee: Wangda Tan
> Attachments: LabelBasedScheduling.pdf, 
> Node-labels-Requirements-Design-doc-V1.pdf, 
> Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, 
> YARN-796.node-label.consolidate.1.patch, 
> YARN-796.node-label.consolidate.2.patch, 
> YARN-796.node-label.consolidate.3.patch, 
> YARN-796.node-label.consolidate.4.patch, 
> YARN-796.node-label.consolidate.5.patch, 
> YARN-796.node-label.consolidate.6.patch, 
> YARN-796.node-label.consolidate.7.patch, 
> YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
> YARN-796.patch, YARN-796.patch4
>
>
> It will be useful for admins to specify labels for nodes. Examples of labels 
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on 
> resource-requests.
> Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good back again

2014-09-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140663#comment-14140663
 ] 

Hadoop QA commented on YARN-90:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12669998/apache-yarn-90.4.patch
  against trunk revision bf27b9c.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5045//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5045//console

This message is automatically generated.

> NodeManager should identify failed disks becoming good back again
> -
>
> Key: YARN-90
> URL: https://issues.apache.org/jira/browse/YARN-90
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Ravi Gummadi
>Assignee: Varun Vasudev
> Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, 
> YARN-90.patch, YARN-90.patch, apache-yarn-90.0.patch, apache-yarn-90.1.patch, 
> apache-yarn-90.2.patch, apache-yarn-90.3.patch, apache-yarn-90.4.patch
>
>
> MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes 
> down, it is marked as failed forever. To reuse that disk (after it becomes 
> good), NodeManager needs restart. This JIRA is to improve NodeManager to 
> reuse good disks(which could be bad some time back).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-90) NodeManager should identify failed disks becoming good back again

2014-09-19 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-90:
--
Attachment: apache-yarn-90.4.patch

Patch with findbugs fix

> NodeManager should identify failed disks becoming good back again
> -
>
> Key: YARN-90
> URL: https://issues.apache.org/jira/browse/YARN-90
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Ravi Gummadi
>Assignee: Varun Vasudev
> Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, 
> YARN-90.patch, YARN-90.patch, apache-yarn-90.0.patch, apache-yarn-90.1.patch, 
> apache-yarn-90.2.patch, apache-yarn-90.3.patch, apache-yarn-90.4.patch
>
>
> MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes 
> down, it is marked as failed forever. To reuse that disk (after it becomes 
> good), NodeManager needs restart. This JIRA is to improve NodeManager to 
> reuse good disks(which could be bad some time back).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-09-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140599#comment-14140599
 ] 

Hadoop QA commented on YARN-913:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12669983/YARN-913-006.patch
  against trunk revision 6fe5c6b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 34 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1267 javac 
compiler warnings (more than the trunk's current 1266 warnings).

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 9 
warning messages.
See 
https://builds.apache.org/job/PreCommit-YARN-Build/5041//artifact/PreCommit-HADOOP-Build-patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 10 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-minikdc 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests:

  org.apache.hadoop.yarn.registry.secure.TestSecureLogins
  
org.apache.hadoop.yarn.registry.operations.TestRegistryOperations
  org.apache.hadoop.yarn.server.TestMiniYARNClusterRegistry

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5041//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5041//artifact/PreCommit-HADOOP-Build-patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5041//artifact/PreCommit-HADOOP-Build-patchprocess/newPatchFindbugsWarningshadoop-yarn-registry.html
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5041//artifact/PreCommit-HADOOP-Build-patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5041//console

This message is automatically generated.

> Add a way to register long-lived services in a YARN cluster
> ---
>
> Key: YARN-913
> URL: https://issues.apache.org/jira/browse/YARN-913
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Affects Versions: 2.5.0, 2.4.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
> 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
> YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
> YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, yarnregistry.pdf, 
> yarnregistry.tla
>
>
> In a YARN cluster you can't predict where services will come up -or on what 
> ports. The services need to work those things out as they come up and then 
> publish them somewhere.
> Applications need to be able to find the service instance they are to bond to 
> -and not any others in the cluster.
> Some kind of service registry -in the RM, in ZK, could do this. If the RM 
> held the write access to the ZK nodes, it would be more secure than having 
> apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1143) Restrict the names that apps and types can have

2014-09-19 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140597#comment-14140597
 ] 

Steve Loughran commented on YARN-1143:
--

I don't think we need to do this except for the special case of apps that want 
to be listed in the registry. Even there, the fact that you can't impose any 
restrictions on usernames implies that 
# punycoding will be needed to convert usernames to DNS standard names.
# the standard DNS limit on 63 chars will have to be ignored.

> Restrict the names that apps and types can have
> ---
>
> Key: YARN-1143
> URL: https://issues.apache.org/jira/browse/YARN-1143
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.1.0-beta
>Reporter: Steve Loughran
>Priority: Minor
>
> YARN-1001 is an example of a RESTy API to the RM's list of apps and app types 
> -and it shows that we may want to add some restrictions on the characters 
> allowed in an app name or type (or at least forbid some) -before it is too 
> late.
> If we don't do that, then tests should verify that you can have apps with 
> high-unicode names as well as other troublesome characters



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-1143) Restrict the names that apps and types can have

2014-09-19 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved YARN-1143.
--
   Resolution: Won't Fix
Fix Version/s: 2.6.0

> Restrict the names that apps and types can have
> ---
>
> Key: YARN-1143
> URL: https://issues.apache.org/jira/browse/YARN-1143
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.1.0-beta
>Reporter: Steve Loughran
>Priority: Minor
> Fix For: 2.6.0
>
>
> YARN-1001 is an example of a RESTy API to the RM's list of apps and app types 
> -and it shows that we may want to add some restrictions on the characters 
> allowed in an app name or type (or at least forbid some) -before it is too 
> late.
> If we don't do that, then tests should verify that you can have apps with 
> high-unicode names as well as other troublesome characters



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2568) TestAMRMClientOnRMRestart test fails

2014-09-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140555#comment-14140555
 ] 

Hudson commented on YARN-2568:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1876 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1876/])
YARN-2568. Fixed the potential test failures due to race conditions when RM 
work-preserving recovery is enabled. Contributed by Jian He. (zjshen: rev 
6fe5c6b746a40019b9a43676c33efec0f971c4b9)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClientOnRMRestart.java


> TestAMRMClientOnRMRestart test fails
> 
>
> Key: YARN-2568
> URL: https://issues.apache.org/jira/browse/YARN-2568
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.6.0
>
> Attachments: YARN-2568.patch, YARN-2568.patch
>
>
> testAMRMClientResendsRequestsOnRMRestart(org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart)
>   Time elapsed: 10.807 sec  <<< FAILURE!
> java.lang.AssertionError: Number of container should be 3 expected:<3> but 
> was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart.testAMRMClientResendsRequestsOnRMRestart(TestAMRMClientOnRMRestart.java:290)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2563) On secure clusters call to timeline server fails with authentication errors when running a job via oozie

2014-09-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140552#comment-14140552
 ] 

Hudson commented on YARN-2563:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1876 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1876/])
YARN-2563. Fixed YarnClient to call getTimeLineDelegationToken only if the 
Token is not present. Contributed by Zhijie Shen (jianhe: rev 
eb92cc67dfaa51212fc5315b8db99effd046a154)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java


> On secure clusters call to timeline server fails with authentication errors 
> when running a job via oozie
> 
>
> Key: YARN-2563
> URL: https://issues.apache.org/jira/browse/YARN-2563
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Arpit Gupta
>Assignee: Zhijie Shen
>Priority: Blocker
> Fix For: 2.6.0
>
> Attachments: YARN-2563.1.patch, YARN-2563.2.patch
>
>
> During our nightlies on a secure cluster we have seen oozie jobs fail with 
> authentication error to the time line server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good back again

2014-09-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140596#comment-14140596
 ] 

Hadoop QA commented on YARN-90:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12669994/apache-yarn-90.3.patch
  against trunk revision 6fe5c6b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5044//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5044//artifact/PreCommit-HADOOP-Build-patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5044//console

This message is automatically generated.

> NodeManager should identify failed disks becoming good back again
> -
>
> Key: YARN-90
> URL: https://issues.apache.org/jira/browse/YARN-90
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Ravi Gummadi
>Assignee: Varun Vasudev
> Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, 
> YARN-90.patch, YARN-90.patch, apache-yarn-90.0.patch, apache-yarn-90.1.patch, 
> apache-yarn-90.2.patch, apache-yarn-90.3.patch
>
>
> MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes 
> down, it is marked as failed forever. To reuse that disk (after it becomes 
> good), NodeManager needs restart. This JIRA is to improve NodeManager to 
> reuse good disks(which could be bad some time back).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1779) Handle AMRMTokens across RM failover

2014-09-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140575#comment-14140575
 ] 

Hudson commented on YARN-1779:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1876 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1876/])
YARN-1779. Fixed AMRMClient to handle AMRMTokens correctly across 
ResourceManager work-preserving-restart or failover. Contributed by Jian He. 
(vinodkv: rev a3d9934f916471a845dc679449d08f94dead550d)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/TestClientRMProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceOnHA.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/AMRMClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/AMRMTokenSelector.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/test/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/TestUnmanagedAMLauncher.java


> Handle AMRMTokens across RM failover
> 
>
> Key: YARN-1779
> URL: https://issues.apache.org/jira/browse/YARN-1779
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Karthik Kambatla
>Assignee: Jian He
>Priority: Blocker
>  Labels: ha
> Fix For: 2.6.0
>
> Attachments: YARN-1779.1.patch, YARN-1779.2.patch, YARN-1779.3.patch, 
> YARN-1779.6.patch
>
>
> Verify if AMRMTokens continue to work against RM failover. If not, we will 
> have to do something along the lines of YARN-986. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2363) Submitted applications occasionally lack a tracking URL

2014-09-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140587#comment-14140587
 ] 

Hudson commented on YARN-2363:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1876 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1876/])
YARN-2363. Submitted applications occasionally lack a tracking URL. Contributed 
by Jason Lowe (jlowe: rev 9ea7b6c063c0bdd4551962e21d0173f671e9df03)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java


> Submitted applications occasionally lack a tracking URL
> ---
>
> Key: YARN-2363
> URL: https://issues.apache.org/jira/browse/YARN-2363
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Fix For: 2.6.0
>
> Attachments: YARN-2363.patch
>
>
> Sometimes when an application is submitted the client receives no tracking 
> URL.  More details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2561) MR job client cannot reconnect to AM after NM restart.

2014-09-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140563#comment-14140563
 ] 

Hudson commented on YARN-2561:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1876 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1876/])
YARN-2561. MR job client cannot reconnect to AM after NM restart. Contributed 
by Junping Du (jlowe: rev a337f0e3549351344bce70cb23ddc0a256c894b0)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* hadoop-yarn-project/CHANGES.txt


> MR job client cannot reconnect to AM after NM restart.
> --
>
> Key: YARN-2561
> URL: https://issues.apache.org/jira/browse/YARN-2561
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Tassapol Athiapinya
>Assignee: Junping Du
>Priority: Blocker
> Fix For: 2.6.0
>
> Attachments: YARN-2561-v2.patch, YARN-2561-v3.patch, 
> YARN-2561-v4.patch, YARN-2561-v5.patch, YARN-2561.patch
>
>
> Work-preserving NM restart is disabled.
> Submit a job. Restart the only NM and found that Job will hang with connect 
> retries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2001) Threshold for RM to accept requests from AM after failover

2014-09-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140542#comment-14140542
 ] 

Hudson commented on YARN-2001:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1876 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1876/])
YARN-2001. Added a time threshold for RM to wait before starting container 
allocations after restart/failover. Contributed by Jian He. (vinodkv: rev 
485c96e3cb9b0b05d6e490b4773506da83ebc61d)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java


> Threshold for RM to accept requests from AM after failover
> --
>
> Key: YARN-2001
> URL: https://issues.apache.org/jira/browse/YARN-2001
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.6.0
>
> Attachments: YARN-2001.1.patch, YARN-2001.2.patch, YARN-2001.3.patch, 
> YARN-2001.4.patch, YARN-2001.5.patch, YARN-2001.5.patch, YARN-2001.5.patch
>
>
> After failover, RM may require a certain threshold to determine whether it’s 
> safe to make scheduling decisions and start accepting new container requests 
> from AMs. The threshold could be a certain amount of nodes. i.e. RM waits 
> until a certain amount of nodes joining before accepting new container 
> requests.  Or it could simply be a timeout, only after the timeout RM accepts 
> new requests. 
> NMs joined after the threshold can be treated as new NMs and instructed to 
> kill all its containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2561) MR job client cannot reconnect to AM after NM restart.

2014-09-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140508#comment-14140508
 ] 

Hudson commented on YARN-2561:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1901 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1901/])
YARN-2561. MR job client cannot reconnect to AM after NM restart. Contributed 
by Junping Du (jlowe: rev a337f0e3549351344bce70cb23ddc0a256c894b0)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java


> MR job client cannot reconnect to AM after NM restart.
> --
>
> Key: YARN-2561
> URL: https://issues.apache.org/jira/browse/YARN-2561
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Tassapol Athiapinya
>Assignee: Junping Du
>Priority: Blocker
> Fix For: 2.6.0
>
> Attachments: YARN-2561-v2.patch, YARN-2561-v3.patch, 
> YARN-2561-v4.patch, YARN-2561-v5.patch, YARN-2561.patch
>
>
> Work-preserving NM restart is disabled.
> Submit a job. Restart the only NM and found that Job will hang with connect 
> retries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2001) Threshold for RM to accept requests from AM after failover

2014-09-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140488#comment-14140488
 ] 

Hudson commented on YARN-2001:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1901 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1901/])
YARN-2001. Added a time threshold for RM to wait before starting container 
allocations after restart/failover. Contributed by Jian He. (vinodkv: rev 
485c96e3cb9b0b05d6e490b4773506da83ebc61d)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java


> Threshold for RM to accept requests from AM after failover
> --
>
> Key: YARN-2001
> URL: https://issues.apache.org/jira/browse/YARN-2001
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.6.0
>
> Attachments: YARN-2001.1.patch, YARN-2001.2.patch, YARN-2001.3.patch, 
> YARN-2001.4.patch, YARN-2001.5.patch, YARN-2001.5.patch, YARN-2001.5.patch
>
>
> After failover, RM may require a certain threshold to determine whether it’s 
> safe to make scheduling decisions and start accepting new container requests 
> from AMs. The threshold could be a certain amount of nodes. i.e. RM waits 
> until a certain amount of nodes joining before accepting new container 
> requests.  Or it could simply be a timeout, only after the timeout RM accepts 
> new requests. 
> NMs joined after the threshold can be treated as new NMs and instructed to 
> kill all its containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1779) Handle AMRMTokens across RM failover

2014-09-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140520#comment-14140520
 ] 

Hudson commented on YARN-1779:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1901 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1901/])
YARN-1779. Fixed AMRMClient to handle AMRMTokens correctly across 
ResourceManager work-preserving-restart or failover. Contributed by Jian He. 
(vinodkv: rev a3d9934f916471a845dc679449d08f94dead550d)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/test/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/TestUnmanagedAMLauncher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/AMRMClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/AMRMTokenSelector.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceOnHA.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/TestClientRMProxy.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java


> Handle AMRMTokens across RM failover
> 
>
> Key: YARN-1779
> URL: https://issues.apache.org/jira/browse/YARN-1779
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Karthik Kambatla
>Assignee: Jian He
>Priority: Blocker
>  Labels: ha
> Fix For: 2.6.0
>
> Attachments: YARN-1779.1.patch, YARN-1779.2.patch, YARN-1779.3.patch, 
> YARN-1779.6.patch
>
>
> Verify if AMRMTokens continue to work against RM failover. If not, we will 
> have to do something along the lines of YARN-986. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2563) On secure clusters call to timeline server fails with authentication errors when running a job via oozie

2014-09-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140498#comment-14140498
 ] 

Hudson commented on YARN-2563:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1901 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1901/])
YARN-2563. Fixed YarnClient to call getTimeLineDelegationToken only if the 
Token is not present. Contributed by Zhijie Shen (jianhe: rev 
eb92cc67dfaa51212fc5315b8db99effd046a154)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* hadoop-yarn-project/CHANGES.txt


> On secure clusters call to timeline server fails with authentication errors 
> when running a job via oozie
> 
>
> Key: YARN-2563
> URL: https://issues.apache.org/jira/browse/YARN-2563
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Arpit Gupta
>Assignee: Zhijie Shen
>Priority: Blocker
> Fix For: 2.6.0
>
> Attachments: YARN-2563.1.patch, YARN-2563.2.patch
>
>
> During our nightlies on a secure cluster we have seen oozie jobs fail with 
> authentication error to the time line server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2363) Submitted applications occasionally lack a tracking URL

2014-09-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140533#comment-14140533
 ] 

Hudson commented on YARN-2363:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1901 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1901/])
YARN-2363. Submitted applications occasionally lack a tracking URL. Contributed 
by Jason Lowe (jlowe: rev 9ea7b6c063c0bdd4551962e21d0173f671e9df03)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java


> Submitted applications occasionally lack a tracking URL
> ---
>
> Key: YARN-2363
> URL: https://issues.apache.org/jira/browse/YARN-2363
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Fix For: 2.6.0
>
> Attachments: YARN-2363.patch
>
>
> Sometimes when an application is submitted the client receives no tracking 
> URL.  More details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2568) TestAMRMClientOnRMRestart test fails

2014-09-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140501#comment-14140501
 ] 

Hudson commented on YARN-2568:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1901 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1901/])
YARN-2568. Fixed the potential test failures due to race conditions when RM 
work-preserving recovery is enabled. Contributed by Jian He. (zjshen: rev 
6fe5c6b746a40019b9a43676c33efec0f971c4b9)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClientOnRMRestart.java


> TestAMRMClientOnRMRestart test fails
> 
>
> Key: YARN-2568
> URL: https://issues.apache.org/jira/browse/YARN-2568
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.6.0
>
> Attachments: YARN-2568.patch, YARN-2568.patch
>
>
> testAMRMClientResendsRequestsOnRMRestart(org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart)
>   Time elapsed: 10.807 sec  <<< FAILURE!
> java.lang.AssertionError: Number of container should be 3 expected:<3> but 
> was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart.testAMRMClientResendsRequestsOnRMRestart(TestAMRMClientOnRMRestart.java:290)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-09-19 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140472#comment-14140472
 ] 

Steve Loughran commented on YARN-913:
-

test result is spurious: the build was cancelled. 

test output from a full run
{code}
--
 T E S T S
---
Running org.apache.hadoop.yarn.registry.client.binding.TestMarshalling
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.463 sec - in 
org.apache.hadoop.yarn.registry.client.binding.TestMarshalling
Running org.apache.hadoop.yarn.registry.client.binding.TestRegistryPathUtils
Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.291 sec - in 
org.apache.hadoop.yarn.registry.client.binding.TestRegistryPathUtils
Running 
org.apache.hadoop.yarn.registry.client.services.TestMicroZookeeperService
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.522 sec - in 
org.apache.hadoop.yarn.registry.client.services.TestMicroZookeeperService
Running org.apache.hadoop.yarn.registry.client.services.TestCuratorService
Tests run: 20, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.852 sec - in 
org.apache.hadoop.yarn.registry.client.services.TestCuratorService
Running org.apache.hadoop.yarn.registry.operations.TestRegistryOperations
Tests run: 23, Failures: 0, Errors: 14, Skipped: 0, Time elapsed: 3.075 sec <<< 
FAILURE! - in org.apache.hadoop.yarn.registry.operations.TestRegistryOperations
testLsEmptyPath(org.apache.hadoop.yarn.registry.operations.TestRegistryOperations)
  Time elapsed: 0.449 sec  <<< ERROR!
java.lang.Exception: Unexpected exception, 
expected but 
was
at org.apache.zookeeper.common.PathUtils.validatePath(PathUtils.java:99)
at 
org.apache.hadoop.yarn.registry.client.binding.RegistryPathUtils.validateZKPath(RegistryPathUtils.java:54)
at 
org.apache.hadoop.yarn.registry.client.binding.RegistryPathUtils.createFullPath(RegistryPathUtils.java:90)
at 
org.apache.hadoop.yarn.registry.client.services.zk.CuratorService.createFullPath(CuratorService.java:304)
at 
org.apache.hadoop.yarn.registry.client.services.zk.CuratorService.zkList(CuratorService.java:696)
at 
org.apache.hadoop.yarn.registry.client.services.RegistryOperationsService.list(RegistryOperationsService.java:173)
at 
org.apache.hadoop.yarn.registry.operations.TestRegistryOperations.testLsEmptyPath(TestRegistryOperations.java:119)

testDeleteNonEmpty(org.apache.hadoop.yarn.registry.operations.TestRegistryOperations)
  Time elapsed: 0.181 sec  <<< ERROR!
org.apache.hadoop.yarn.registry.client.exceptions.RegistryIOException: 
`//users/devteam/org-apache-hadoop/hdfs': Failure of existence check on 
//users/devteam/org-apache-hadoop/hdfs: 
org.apache.hadoop.yarn.registry.client.exceptions.InvalidPathnameException: 
`/registry//users/devteam/org-apache-hadoop/hdfs': Invalid Path 
"/registry//users/devteam/org-apache-hadoop/hdfs" : 
java.lang.IllegalArgumentException: Invalid path string 
"/registry//users/devteam/org-apache-hadoop/hdfs" caused by empty node name 
specified @10: Invalid path string 
"/registry//users/devteam/org-apache-hadoop/hdfs" caused by empty node name 
specified @10: `/registry//users/devteam/org-apache-hadoop/hdfs': Invalid Path 
"/registry//users/devteam/org-apache-hadoop/hdfs" : 
java.lang.IllegalArgumentException: Invalid path string 
"/registry//users/devteam/org-apache-hadoop/hdfs" caused by empty node name 
specified @10: Invalid path string 
"/registry//users/devteam/org-apache-hadoop/hdfs" caused by empty node name 
specified @10
at org.apache.zookeeper.common.PathUtils.validatePath(PathUtils.java:99)
at 
org.apache.hadoop.yarn.registry.client.binding.RegistryPathUtils.validateZKPath(RegistryPathUtils.java:54)
at 
org.apache.hadoop.yarn.registry.client.binding.RegistryPathUtils.createFullPath(RegistryPathUtils.java:90)
at 
org.apache.hadoop.yarn.registry.client.services.zk.CuratorService.createFullPath(CuratorService.java:304)
at 
org.apache.hadoop.yarn.registry.client.services.zk.CuratorService.zkStat(CuratorService.java:469)
at 
org.apache.hadoop.yarn.registry.client.services.zk.CuratorService.zkPathExists(CuratorService.java:511)
at 
org.apache.hadoop.yarn.registry.client.services.zk.CuratorService.zkSet(CuratorService.java:645)
at 
org.apache.hadoop.yarn.registry.client.services.RegistryOperationsService.create(RegistryOperationsService.java:135)
at 
org.apache.hadoop.yarn.registry.AbstractRegistryTest.putExampleServiceEntry(AbstractRegistryTest.java:88)
at 
org.apache.hadoop.yarn.registry.AbstractRegistryTest.putExampleServiceEntry(AbstractRegistryTest.java:70)
at 
org.apache.hadoop.yarn.registry.operations.TestRegistryOperations.testDeleteNonEmpty(TestRegistryOperations.java:100)

testP

[jira] [Updated] (YARN-90) NodeManager should identify failed disks becoming good back again

2014-09-19 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-90:
--
Attachment: apache-yarn-90.3.patch

Rebase patch to trunk and small improvements to attempt cleanups on full 
directories.

> NodeManager should identify failed disks becoming good back again
> -
>
> Key: YARN-90
> URL: https://issues.apache.org/jira/browse/YARN-90
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Ravi Gummadi
>Assignee: Varun Vasudev
> Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, 
> YARN-90.patch, YARN-90.patch, apache-yarn-90.0.patch, apache-yarn-90.1.patch, 
> apache-yarn-90.2.patch, apache-yarn-90.3.patch
>
>
> MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes 
> down, it is marked as failed forever. To reuse that disk (after it becomes 
> good), NodeManager needs restart. This JIRA is to improve NodeManager to 
> reuse good disks(which could be bad some time back).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-09-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140438#comment-14140438
 ] 

Hadoop QA commented on YARN-913:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12669983/YARN-913-006.patch
  against trunk revision 6fe5c6b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 34 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1267 javac 
compiler warnings (more than the trunk's current 1266 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5042//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5042//artifact/PreCommit-HADOOP-Build-patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5042//console

This message is automatically generated.

> Add a way to register long-lived services in a YARN cluster
> ---
>
> Key: YARN-913
> URL: https://issues.apache.org/jira/browse/YARN-913
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Affects Versions: 2.5.0, 2.4.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
> 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
> YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
> YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, yarnregistry.pdf, 
> yarnregistry.tla
>
>
> In a YARN cluster you can't predict where services will come up -or on what 
> ports. The services need to work those things out as they come up and then 
> publish them somewhere.
> Applications need to be able to find the service instance they are to bond to 
> -and not any others in the cluster.
> Some kind of service registry -in the RM, in ZK, could do this. If the RM 
> held the write access to the ZK nodes, it would be more secure than having 
> apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

2014-09-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140435#comment-14140435
 ] 

Hadoop QA commented on YARN-2198:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12669988/YARN-2198.delta.7.patch
  against trunk revision 6fe5c6b.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5043//console

This message is automatically generated.

> Remove the need to run NodeManager as privileged account for Windows Secure 
> Container Executor
> --
>
> Key: YARN-2198
> URL: https://issues.apache.org/jira/browse/YARN-2198
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>  Labels: security, windows
> Attachments: YARN-2198.1.patch, YARN-2198.2.patch, YARN-2198.3.patch, 
> YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, 
> YARN-2198.delta.7.patch, YARN-2198.separation.patch, YARN-2198.trunk.4.patch, 
> YARN-2198.trunk.5.patch, YARN-2198.trunk.6.patch
>
>
> YARN-1972 introduces a Secure Windows Container Executor. However this 
> executor requires a the process launching the container to be LocalSystem or 
> a member of the a local Administrators group. Since the process in question 
> is the NodeManager, the requirement translates to the entire NM to run as a 
> privileged account, a very large surface area to review and protect.
> This proposal is to move the privileged operations into a dedicated NT 
> service. The NM can run as a low privilege account and communicate with the 
> privileged NT service when it needs to launch a container. This would reduce 
> the surface exposed to the high privileges. 
> There has to exist a secure, authenticated and authorized channel of 
> communication between the NM and the privileged NT service. Possible 
> alternatives are a new TCP endpoint, Java RPC etc. My proposal though would 
> be to use Windows LPC (Local Procedure Calls), which is a Windows platform 
> specific inter-process communication channel that satisfies all requirements 
> and is easy to deploy. The privileged NT service would register and listen on 
> an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop 
> with libwinutils which would host the LPC client code. The client would 
> connect to the LPC port (NtConnectPort) and send a message requesting a 
> container launch (NtRequestWaitReplyPort). LPC provides authentication and 
> the privileged NT service can use authorization API (AuthZ) to validate the 
> caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-668) TokenIdentifier serialization should consider Unknown fields

2014-09-19 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140434#comment-14140434
 ] 

Junping Du commented on YARN-668:
-

bq. Drop all the getters from each tokens to avoid more leaking in the future. 
We don't need any getters() exposed.
Hmm... I think these getters are still useful as pretty handy to change from 
proto object to normal one, may be we can keep it just as other PBImpls?

> TokenIdentifier serialization should consider Unknown fields
> 
>
> Key: YARN-668
> URL: https://issues.apache.org/jira/browse/YARN-668
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Junping Du
>Priority: Blocker
> Attachments: YARN-668-demo.patch
>
>
> This would allow changing of the TokenIdentifier between versions. The 
> current serialization is Writable. A simple way to achieve this would be to 
> have a Proto object as the payload for TokenIdentifiers, instead of 
> individual fields.
> TokenIdentifier continues to implement Writable to work with the RPC layer - 
> but the payload itself is serialized using PB.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

2014-09-19 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated YARN-2198:
---
Attachment: YARN-2198.delta.7.patch

This fixes YARN-2553 issues around KillTask. Also elevated chmod was not 
implemented (D'oh!) and the WSCE dir perms was wrong (0710 inherited from DCE, 
needs to be 0750 similar to LCE).

I will not upload trunk.7.patch given that for the moment Jenkins cannot handle 
the CRLF mismatch in .vcxproj and .sln files. I'm following up on that issue 
with HW engineers.

> Remove the need to run NodeManager as privileged account for Windows Secure 
> Container Executor
> --
>
> Key: YARN-2198
> URL: https://issues.apache.org/jira/browse/YARN-2198
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>  Labels: security, windows
> Attachments: YARN-2198.1.patch, YARN-2198.2.patch, YARN-2198.3.patch, 
> YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, 
> YARN-2198.delta.7.patch, YARN-2198.separation.patch, YARN-2198.trunk.4.patch, 
> YARN-2198.trunk.5.patch, YARN-2198.trunk.6.patch
>
>
> YARN-1972 introduces a Secure Windows Container Executor. However this 
> executor requires a the process launching the container to be LocalSystem or 
> a member of the a local Administrators group. Since the process in question 
> is the NodeManager, the requirement translates to the entire NM to run as a 
> privileged account, a very large surface area to review and protect.
> This proposal is to move the privileged operations into a dedicated NT 
> service. The NM can run as a low privilege account and communicate with the 
> privileged NT service when it needs to launch a container. This would reduce 
> the surface exposed to the high privileges. 
> There has to exist a secure, authenticated and authorized channel of 
> communication between the NM and the privileged NT service. Possible 
> alternatives are a new TCP endpoint, Java RPC etc. My proposal though would 
> be to use Windows LPC (Local Procedure Calls), which is a Windows platform 
> specific inter-process communication channel that satisfies all requirements 
> and is easy to deploy. The privileged NT service would register and listen on 
> an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop 
> with libwinutils which would host the LPC client code. The client would 
> connect to the LPC port (NtConnectPort) and send a message requesting a 
> container launch (NtRequestWaitReplyPort). LPC provides authentication and 
> the privileged NT service can use authorization API (AuthZ) to validate the 
> caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2553) Windows Secure Container Executor: assign PROCESS_TERMINATE privilege to NM on created containers

2014-09-19 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu resolved YARN-2553.

Resolution: Not a Problem

After further investigation I concluded that there is no way to prevent the 
access_denied on the joc object during the container shutdown. I have moved the 
kill task code inside the hadoopwinutils, running as LocalSystem, with SeDebug 
privilege enabled, and after LocalSystem is explicitly granted 
JOB_OBJECT_ALL_ACCESS on the job, and still get access denied.
I fixed the kill task to return success int his case and commented out the 
issue. The fixed code will be in the next patch of YARN-2198.

> Windows Secure Container Executor: assign PROCESS_TERMINATE privilege to NM 
> on created containers
> -
>
> Key: YARN-2553
> URL: https://issues.apache.org/jira/browse/YARN-2553
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>  Labels: security, windows, wsce
>
> In order to open a job handle with JOB_OBJECT_TERMINATE access, the caller 
> must have PROCESS_TERMINATE access on the handle of each process in the job 
> (MSDN 
> http://msdn.microsoft.com/en-us/library/windows/desktop/ms686709(v=vs.85).aspx)
>  .
> hadoopwinutilsvc process should explicitly grant PROCESS_TERMINATE access to 
> NM account on the newly started container process. I hope this gets 
> inherited...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-09-19 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-913:

Attachment: YARN-913-006.patch

patch --006; fixes dshell test. No idea why the others are failing, and we 
still can't get the results yet

> Add a way to register long-lived services in a YARN cluster
> ---
>
> Key: YARN-913
> URL: https://issues.apache.org/jira/browse/YARN-913
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Affects Versions: 2.5.0, 2.4.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
> 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
> YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
> YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, yarnregistry.pdf, 
> yarnregistry.tla
>
>
> In a YARN cluster you can't predict where services will come up -or on what 
> ports. The services need to work those things out as they come up and then 
> publish them somewhere.
> Applications need to be able to find the service instance they are to bond to 
> -and not any others in the cluster.
> Some kind of service registry -in the RM, in ZK, could do this. If the RM 
> held the write access to the ZK nodes, it would be more secure than having 
> apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2363) Submitted applications occasionally lack a tracking URL

2014-09-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140381#comment-14140381
 ] 

Hudson commented on YARN-2363:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #685 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/685/])
YARN-2363. Submitted applications occasionally lack a tracking URL. Contributed 
by Jason Lowe (jlowe: rev 9ea7b6c063c0bdd4551962e21d0173f671e9df03)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java


> Submitted applications occasionally lack a tracking URL
> ---
>
> Key: YARN-2363
> URL: https://issues.apache.org/jira/browse/YARN-2363
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Fix For: 2.6.0
>
> Attachments: YARN-2363.patch
>
>
> Sometimes when an application is submitted the client receives no tracking 
> URL.  More details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1779) Handle AMRMTokens across RM failover

2014-09-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140369#comment-14140369
 ] 

Hudson commented on YARN-1779:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #685 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/685/])
YARN-1779. Fixed AMRMClient to handle AMRMTokens correctly across 
ResourceManager work-preserving-restart or failover. Contributed by Jian He. 
(vinodkv: rev a3d9934f916471a845dc679449d08f94dead550d)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/AMRMTokenSelector.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/test/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/TestUnmanagedAMLauncher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/TestClientRMProxy.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/AMRMClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceOnHA.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java


> Handle AMRMTokens across RM failover
> 
>
> Key: YARN-1779
> URL: https://issues.apache.org/jira/browse/YARN-1779
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Karthik Kambatla
>Assignee: Jian He
>Priority: Blocker
>  Labels: ha
> Fix For: 2.6.0
>
> Attachments: YARN-1779.1.patch, YARN-1779.2.patch, YARN-1779.3.patch, 
> YARN-1779.6.patch
>
>
> Verify if AMRMTokens continue to work against RM failover. If not, we will 
> have to do something along the lines of YARN-986. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2568) TestAMRMClientOnRMRestart test fails

2014-09-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140349#comment-14140349
 ] 

Hudson commented on YARN-2568:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #685 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/685/])
YARN-2568. Fixed the potential test failures due to race conditions when RM 
work-preserving recovery is enabled. Contributed by Jian He. (zjshen: rev 
6fe5c6b746a40019b9a43676c33efec0f971c4b9)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClientOnRMRestart.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java


> TestAMRMClientOnRMRestart test fails
> 
>
> Key: YARN-2568
> URL: https://issues.apache.org/jira/browse/YARN-2568
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.6.0
>
> Attachments: YARN-2568.patch, YARN-2568.patch
>
>
> testAMRMClientResendsRequestsOnRMRestart(org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart)
>   Time elapsed: 10.807 sec  <<< FAILURE!
> java.lang.AssertionError: Number of container should be 3 expected:<3> but 
> was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart.testAMRMClientResendsRequestsOnRMRestart(TestAMRMClientOnRMRestart.java:290)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2561) MR job client cannot reconnect to AM after NM restart.

2014-09-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140356#comment-14140356
 ] 

Hudson commented on YARN-2561:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #685 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/685/])
YARN-2561. MR job client cannot reconnect to AM after NM restart. Contributed 
by Junping Du (jlowe: rev a337f0e3549351344bce70cb23ddc0a256c894b0)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java


> MR job client cannot reconnect to AM after NM restart.
> --
>
> Key: YARN-2561
> URL: https://issues.apache.org/jira/browse/YARN-2561
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Tassapol Athiapinya
>Assignee: Junping Du
>Priority: Blocker
> Fix For: 2.6.0
>
> Attachments: YARN-2561-v2.patch, YARN-2561-v3.patch, 
> YARN-2561-v4.patch, YARN-2561-v5.patch, YARN-2561.patch
>
>
> Work-preserving NM restart is disabled.
> Submit a job. Restart the only NM and found that Job will hang with connect 
> retries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2001) Threshold for RM to accept requests from AM after failover

2014-09-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140336#comment-14140336
 ] 

Hudson commented on YARN-2001:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #685 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/685/])
YARN-2001. Added a time threshold for RM to wait before starting container 
allocations after restart/failover. Contributed by Jian He. (vinodkv: rev 
485c96e3cb9b0b05d6e490b4773506da83ebc61d)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java


> Threshold for RM to accept requests from AM after failover
> --
>
> Key: YARN-2001
> URL: https://issues.apache.org/jira/browse/YARN-2001
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.6.0
>
> Attachments: YARN-2001.1.patch, YARN-2001.2.patch, YARN-2001.3.patch, 
> YARN-2001.4.patch, YARN-2001.5.patch, YARN-2001.5.patch, YARN-2001.5.patch
>
>
> After failover, RM may require a certain threshold to determine whether it’s 
> safe to make scheduling decisions and start accepting new container requests 
> from AMs. The threshold could be a certain amount of nodes. i.e. RM waits 
> until a certain amount of nodes joining before accepting new container 
> requests.  Or it could simply be a timeout, only after the timeout RM accepts 
> new requests. 
> NMs joined after the threshold can be treated as new NMs and instructed to 
> kill all its containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2563) On secure clusters call to timeline server fails with authentication errors when running a job via oozie

2014-09-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140346#comment-14140346
 ] 

Hudson commented on YARN-2563:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #685 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/685/])
YARN-2563. Fixed YarnClient to call getTimeLineDelegationToken only if the 
Token is not present. Contributed by Zhijie Shen (jianhe: rev 
eb92cc67dfaa51212fc5315b8db99effd046a154)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* hadoop-yarn-project/CHANGES.txt


> On secure clusters call to timeline server fails with authentication errors 
> when running a job via oozie
> 
>
> Key: YARN-2563
> URL: https://issues.apache.org/jira/browse/YARN-2563
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Arpit Gupta
>Assignee: Zhijie Shen
>Priority: Blocker
> Fix For: 2.6.0
>
> Attachments: YARN-2563.1.patch, YARN-2563.2.patch
>
>
> During our nightlies on a secure cluster we have seen oozie jobs fail with 
> authentication error to the time line server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-09-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140331#comment-14140331
 ] 

Hadoop QA commented on YARN-796:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12669943/YARN-796.node-label.consolidate.8.patch
  against trunk revision 6fe5c6b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 37 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.mapreduce.lib.input.TestMRCJCFileInputFormat

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5040//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5040//console

This message is automatically generated.

> Allow for (admin) labels on nodes and resource-requests
> ---
>
> Key: YARN-796
> URL: https://issues.apache.org/jira/browse/YARN-796
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.1
>Reporter: Arun C Murthy
>Assignee: Wangda Tan
> Attachments: LabelBasedScheduling.pdf, 
> Node-labels-Requirements-Design-doc-V1.pdf, 
> Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, 
> YARN-796.node-label.consolidate.1.patch, 
> YARN-796.node-label.consolidate.2.patch, 
> YARN-796.node-label.consolidate.3.patch, 
> YARN-796.node-label.consolidate.4.patch, 
> YARN-796.node-label.consolidate.5.patch, 
> YARN-796.node-label.consolidate.6.patch, 
> YARN-796.node-label.consolidate.7.patch, 
> YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
> YARN-796.patch, YARN-796.patch4
>
>
> It will be useful for admins to specify labels for nodes. Examples of labels 
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on 
> resource-requests.
> Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2570) Refactor code to not use SchedulerResourceTypes protobuf enum directly

2014-09-19 Thread Varun Vasudev (JIRA)
Varun Vasudev created YARN-2570:
---

 Summary: Refactor code to not use SchedulerResourceTypes protobuf 
enum directly
 Key: YARN-2570
 URL: https://issues.apache.org/jira/browse/YARN-2570
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Varun Vasudev
Assignee: Varun Vasudev


>From MAPREDUCE-5279,

{quote}
2. It's not related to this patch, but I think we need to fix the problem: 
SchedulerResourceTypes is the generated by protobuf, we shouldn't refer to it 
directly, which will potentially break the binary compatibility if we upgrade 
protobuf to new version. The other enums in proto adopt the following way: 
defining a SchedulerResourceTypes java enum, and a SchedulerResourceTypesProto 
protobuf enum. In ProtoUtils, defining convertTo/FromProtoFormat methods to 
convert one object to the other. Please file ticket to for this issue.
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2056) Disable preemption at Queue level

2014-09-19 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140295#comment-14140295
 ] 

Wangda Tan commented on YARN-2056:
--

Hi [~eepayne],
Thanks for updating, I've took a look at your patch and completely re-thought 
about it. Some ideas:

In existing preemption logic, the {{resetCapacity}} is a static calculation, 
irrelatated to resource usage of any queue.
And with result calculated by {{resetCapacity}}, {{computeFixpointAllocation}} 
will make sure that: *If there's any resource isn't allocated, it will be 
assigned to existing queues by its normalized_guarantee calculated*. This is 
foundation of following calculation, like get {{toBePreempted}} of each queue, 
we shouldn't break it.

IMHO, the right place to put reserving resource logic for un-preemptable queue 
is not {{resetCapacity}}, it should in {{computeFixpointAllocation}}.

I think adding {{preemptableExtra}} in each TempQueue, and ParentQueue will 
accumulate {{preemptableExtra}} of its children is a good idea.

What I proposed to do in {{computeFixpointAllocation}}

{code}
computeFixpointAllocation {
...

+   for (q in qAlloc) {
+   if (q.disablePreempt) {
+   q.ideal_assigned = q.current
+   unassigned = unassigned - q.ideal_assigned
+   }
+   }

while (!qAlloc.isEmpty()
&& Resources.greaterThan(rc, tot_guarant, unassigned, Resources.none())
&& Resources.greaterThan(rc, tot_guarant, wQassigned, 
Resources.none())) {
...
}
}
{code}

And in the while (!qAlloc.isEmpty()) loop above, we need take care the fact: 
some queue's ideal_assigned is started from 0, and some queue's ideal_assigned 
is started from current.

Does this make sense to you?

Wangda

> Disable preemption at Queue level
> -
>
> Key: YARN-2056
> URL: https://issues.apache.org/jira/browse/YARN-2056
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Mayank Bansal
>Assignee: Eric Payne
> Attachments: YARN-2056.201408202039.txt, YARN-2056.201408260128.txt, 
> YARN-2056.201408310117.txt, YARN-2056.201409022208.txt, 
> YARN-2056.201409181916.txt
>
>
> We need to be able to disable preemption at individual queue level



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-09-19 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-796:

Attachment: YARN-796.node-label.consolidate.8.patch

Attached new patch fixed javac warnings, findbugs and test failures

> Allow for (admin) labels on nodes and resource-requests
> ---
>
> Key: YARN-796
> URL: https://issues.apache.org/jira/browse/YARN-796
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.1
>Reporter: Arun C Murthy
>Assignee: Wangda Tan
> Attachments: LabelBasedScheduling.pdf, 
> Node-labels-Requirements-Design-doc-V1.pdf, 
> Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, 
> YARN-796.node-label.consolidate.1.patch, 
> YARN-796.node-label.consolidate.2.patch, 
> YARN-796.node-label.consolidate.3.patch, 
> YARN-796.node-label.consolidate.4.patch, 
> YARN-796.node-label.consolidate.5.patch, 
> YARN-796.node-label.consolidate.6.patch, 
> YARN-796.node-label.consolidate.7.patch, 
> YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
> YARN-796.patch, YARN-796.patch4
>
>
> It will be useful for admins to specify labels for nodes. Examples of labels 
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on 
> resource-requests.
> Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)