[jira] [Updated] (YARN-2161) Fix build on macosx: YARN parts

2014-09-24 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated YARN-2161:

Attachment: YARN-2161.v2.patch

Thanks for the review Allen. Attach new version of the patch, changes:
1. use cmake CHECK_FUNCTION_EXISTS to check fcloseall exists
2. change user bin to user daemon (which both linux and macosx have)
3. add some fix in YARN-1327 (setpgid and libgen.h)


> Fix build on macosx: YARN parts
> ---
>
> Key: YARN-2161
> URL: https://issues.apache.org/jira/browse/YARN-2161
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Attachments: YARN-2161.v1.patch, YARN-2161.v2.patch
>
>
> When compiling on macosx with -Pnative, there are several warning and errors, 
> fix this would help hadoop developers with macosx env. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2161) Fix build on macosx: YARN parts

2014-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146037#comment-14146037
 ] 

Hadoop QA commented on YARN-2161:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670922/YARN-2161.v2.patch
  against trunk revision ef784a2.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5094//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5094//console

This message is automatically generated.

> Fix build on macosx: YARN parts
> ---
>
> Key: YARN-2161
> URL: https://issues.apache.org/jira/browse/YARN-2161
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Attachments: YARN-2161.v1.patch, YARN-2161.v2.patch
>
>
> When compiling on macosx with -Pnative, there are several warning and errors, 
> fix this would help hadoop developers with macosx env. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2523) ResourceManager UI showing negative value for "Decommissioned Nodes" field

2014-09-24 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146083#comment-14146083
 ] 

Rohith commented on YARN-2523:
--

Thank you Jason Lowe for your suggestion.

Considering your point, I did some more tests without my patch. 
* 1 Add hosts in include list only and refresh nodes. Decommisioned node is 1. 
If again call refreshNodes, then Decommisioned  nodes is 0.
* 2 Add hosts in include list only and refresh nodes. Decommisioned node is 1. 
If Restart RM , then Decommisioned nodes is 0. But here RM cant get old value 
unless it is store at zookeeper.
* 3 Add hosts in exclude list only and refresh nodes. Decommisioned node is 1. 
Remove hosts from eclude list and refresh nodes. Start NodeManger. 
Decommisioned  nodes is -1.

Setting decomissioned nodes on refreshNode causing problem. However RMNodeImpl 
sets while deactivating node will be fine. For RM restart , setting at 
serviceInit holds good.


> ResourceManager UI showing negative value for "Decommissioned Nodes" field
> --
>
> Key: YARN-2523
> URL: https://issues.apache.org/jira/browse/YARN-2523
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Affects Versions: 3.0.0
>Reporter: Nishan Shetty
>Assignee: Rohith
> Attachments: YARN-2523.patch, YARN-2523.patch
>
>
> 1. Decommission one NodeManager by configuring ip in excludehost file
> 2. Remove ip from excludehost file
> 3. Execute -refreshNodes command and restart Decommissioned NodeManager
> Observe that in RM UI negative value for "Decommissioned Nodes" field is shown



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2594) ResourceManger sometimes become un-responsive

2014-09-24 Thread Karam Singh (JIRA)
Karam Singh created YARN-2594:
-

 Summary: ResourceManger sometimes become un-responsive
 Key: YARN-2594
 URL: https://issues.apache.org/jira/browse/YARN-2594
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karam Singh


ResoruceManager sometimes become un-responsive:
There was in exception in ResourceManager log and contains only  following type 
of messages:
{code}
2014-09-19 19:13:45,241 INFO  event.AsyncDispatcher 
(AsyncDispatcher.java:handle(232)) - Size of event-queue is 53000
2014-09-19 19:30:26,312 INFO  event.AsyncDispatcher 
(AsyncDispatcher.java:handle(232)) - Size of event-queue is 54000
2014-09-19 19:47:07,351 INFO  event.AsyncDispatcher 
(AsyncDispatcher.java:handle(232)) - Size of event-queue is 55000
2014-09-19 20:03:48,460 INFO  event.AsyncDispatcher 
(AsyncDispatcher.java:handle(232)) - Size of event-queue is 56000
2014-09-19 20:20:29,542 INFO  event.AsyncDispatcher 
(AsyncDispatcher.java:handle(232)) - Size of event-queue is 57000
2014-09-19 20:37:10,635 INFO  event.AsyncDispatcher 
(AsyncDispatcher.java:handle(232)) - Size of event-queue is 58000
2014-09-19 20:53:51,722 INFO  event.AsyncDispatcher 
(AsyncDispatcher.java:handle(232)) - Size of event-queue is 59000
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2569) Log Handling for LRS API Changes

2014-09-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146211#comment-14146211
 ] 

Hudson commented on YARN-2569:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #690 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/690/])
YARN-2569. Added the log handling APIs for the long running services. 
Contributed by Xuan Gong. (zjshen: rev 5338ac416ab8ab3e7e0a7bfb4a53151fc457f673)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/LogAggregationContextPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestPBImplRecords.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/LogAggregationContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationSubmissionContextPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto


> Log Handling for LRS API Changes
> 
>
> Key: YARN-2569
> URL: https://issues.apache.org/jira/browse/YARN-2569
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.6.0
>
> Attachments: YARN-2569.1.patch, YARN-2569.2.patch, YARN-2569.3.patch, 
> YARN-2569.4.1.patch, YARN-2569.4.patch, YARN-2569.5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2523) ResourceManager UI showing negative value for "Decommissioned Nodes" field

2014-09-24 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-2523:
-
Attachment: YARN-2523.1.patch

Updated the patch for handling tests mentioned in my previous comment. Please 
review

> ResourceManager UI showing negative value for "Decommissioned Nodes" field
> --
>
> Key: YARN-2523
> URL: https://issues.apache.org/jira/browse/YARN-2523
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Affects Versions: 3.0.0
>Reporter: Nishan Shetty
>Assignee: Rohith
> Attachments: YARN-2523.1.patch, YARN-2523.patch, YARN-2523.patch
>
>
> 1. Decommission one NodeManager by configuring ip in excludehost file
> 2. Remove ip from excludehost file
> 3. Execute -refreshNodes command and restart Decommissioned NodeManager
> Observe that in RM UI negative value for "Decommissioned Nodes" field is shown



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2595) NullPointerException is thrown while RM shutdown

2014-09-24 Thread Nishan Shetty (JIRA)
Nishan Shetty created YARN-2595:
---

 Summary: NullPointerException is thrown while RM shutdown
 Key: YARN-2595
 URL: https://issues.apache.org/jira/browse/YARN-2595
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Nishan Shetty
Priority: Minor


2014-08-03 09:45:55,110 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
Error in dispatcher thread 
java.lang.NullPointerException 
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.writeAuditLog(RMAppManager.java:221)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.finishApplication(RMAppManager.java:213)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:480)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:71)
 
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) 
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) 
at java.lang.Thread.run(Thread.java:662) 
2014-08-03 09:45:55,111 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=testos 
OPERATION=refreshAdminAcls TARGET=AdminService RESULT=SUCCESS 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2595) NullPointerException is thrown while RM shutdown

2014-09-24 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146263#comment-14146263
 ] 

Devaraj K commented on YARN-2595:
-

[~nishan] Thanks for reporting this issue. Can you also provide the resource 
manager log or at least some log content above this exception? Thanks..

> NullPointerException is thrown while RM shutdown
> 
>
> Key: YARN-2595
> URL: https://issues.apache.org/jira/browse/YARN-2595
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
>Reporter: Nishan Shetty
>Priority: Minor
>
> 2014-08-03 09:45:55,110 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread 
> java.lang.NullPointerException 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.writeAuditLog(RMAppManager.java:221)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.finishApplication(RMAppManager.java:213)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:480)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:71)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) 
> at java.lang.Thread.run(Thread.java:662) 
> 2014-08-03 09:45:55,111 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=testos 
> OPERATION=refreshAdminAcls TARGET=AdminService RESULT=SUCCESS 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2594) ResourceManger sometimes become un-responsive

2014-09-24 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K reassigned YARN-2594:
---

Assignee: Devaraj K

> ResourceManger sometimes become un-responsive
> -
>
> Key: YARN-2594
> URL: https://issues.apache.org/jira/browse/YARN-2594
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Karam Singh
>Assignee: Devaraj K
>
> ResoruceManager sometimes become un-responsive:
> There was in exception in ResourceManager log and contains only  following 
> type of messages:
> {code}
> 2014-09-19 19:13:45,241 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 53000
> 2014-09-19 19:30:26,312 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 54000
> 2014-09-19 19:47:07,351 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 55000
> 2014-09-19 20:03:48,460 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 56000
> 2014-09-19 20:20:29,542 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 57000
> 2014-09-19 20:37:10,635 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 58000
> 2014-09-19 20:53:51,722 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 59000
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2129) Add scheduling priority to the WindowsSecureContainerExecutor

2014-09-24 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146275#comment-14146275
 ] 

Remus Rusanu commented on YARN-2129:


A better alternative implementation would be to sue 
JOB_OBJECT_LIMIT_PRIORITY_CLASS on the container job

> Add scheduling priority to the WindowsSecureContainerExecutor
> -
>
> Key: YARN-2129
> URL: https://issues.apache.org/jira/browse/YARN-2129
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>  Labels: security, windows
> Attachments: YARN-2129.1.patch, YARN-2129.2.patch
>
>
> The WCE (YARN-1972) could and should honor 
> NM_CONTAINER_EXECUTOR_SCHED_PRIORITY.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2523) ResourceManager UI showing negative value for "Decommissioned Nodes" field

2014-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146282#comment-14146282
 ] 

Hadoop QA commented on YARN-2523:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670954/YARN-2523.1.patch
  against trunk revision ef784a2.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5095//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5095//console

This message is automatically generated.

> ResourceManager UI showing negative value for "Decommissioned Nodes" field
> --
>
> Key: YARN-2523
> URL: https://issues.apache.org/jira/browse/YARN-2523
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Affects Versions: 3.0.0
>Reporter: Nishan Shetty
>Assignee: Rohith
> Attachments: YARN-2523.1.patch, YARN-2523.patch, YARN-2523.patch
>
>
> 1. Decommission one NodeManager by configuring ip in excludehost file
> 2. Remove ip from excludehost file
> 3. Execute -refreshNodes command and restart Decommissioned NodeManager
> Observe that in RM UI negative value for "Decommissioned Nodes" field is shown



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-09-24 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146321#comment-14146321
 ] 

Wangda Tan commented on YARN-796:
-

Had an offline discussion with [~cwelch] today, based on Craig's comment on 
YARN-2496: 
https://issues.apache.org/jira/browse/YARN-2496?focusedCommentId=14143993&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14143993.
 I think it's better to put here for more discussions.

*A simple summary of the problem is:*
Current queues and nodes have labels, queue may not be able to access all nodes 
in the cluster, so the headroom might be less than headroom calculated today.
Today in YARN-2496, headroom caculation changed to {{headroom = min(headroom, 
total-resource-of-the-queue-can-access)}}.
However, this may not enough, application may set label it required (e.g. 
label-expression = GPU && LARGE_MEMORY). It's better to return headroom 
according to the label expression of the application to avoid resource 
deadlock, etc. problems.
We will have two problems to support this,
# There can be thousands of combinations of label expression, it will be a very 
large calculation amount for headroom when we have many application running and 
ask for different labels at the same time.
# A single application can ask for different label expressions for different 
containers (like mapper need GPU but reduer not), a single headroom returned by 
AllocateResponse may not enough.

*Proposed solutions:*
Solution #1:
Assume a relatively small number of unique label-expression can satisfy most 
applications. We can add an option in capacity-scheduler.xml, users can add 
list of label-expressions need pre-calculated, number of such label-expressions 
should be small (like <= 100 in the whole cluster). NodeLabelManager will 
update them when node join, leave or label changed.
And add a new field in AllocateResponse, like {{Map labelExpToHeadroom}}. We will return the list of 
precalculated headrooms back to AM, and AM can make decision how to use it.

Solution #2:
AM will receive updated nodes (a list of NodeReport) from RM in 
AllocateResponse, AM itself can figure out how to get headroom of specified 
label-expression according to updated NMs. This is simpler than #1, but AM side 
need implement its own logic to support it.

Hope to get more thoughts about this,

Thanks,
Wangda

> Allow for (admin) labels on nodes and resource-requests
> ---
>
> Key: YARN-796
> URL: https://issues.apache.org/jira/browse/YARN-796
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.1
>Reporter: Arun C Murthy
>Assignee: Wangda Tan
> Attachments: LabelBasedScheduling.pdf, 
> Node-labels-Requirements-Design-doc-V1.pdf, 
> Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, 
> YARN-796.node-label.consolidate.1.patch, 
> YARN-796.node-label.consolidate.2.patch, 
> YARN-796.node-label.consolidate.3.patch, 
> YARN-796.node-label.consolidate.4.patch, 
> YARN-796.node-label.consolidate.5.patch, 
> YARN-796.node-label.consolidate.6.patch, 
> YARN-796.node-label.consolidate.7.patch, 
> YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
> YARN-796.patch, YARN-796.patch4
>
>
> It will be useful for admins to specify labels for nodes. Examples of labels 
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on 
> resource-requests.
> Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2588) Standby RM does not transitionToActive if previous transitionToActive is failed with ZK exception.

2014-09-24 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-2588:
-
Attachment: YARN-2588.patch

Updated the patch for fixing issue. Please review..

> Standby RM does not transitionToActive if previous transitionToActive is 
> failed with ZK exception.
> --
>
> Key: YARN-2588
> URL: https://issues.apache.org/jira/browse/YARN-2588
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0, 2.6.0, 2.5.1
>Reporter: Rohith
>Assignee: Rohith
> Attachments: YARN-2588.patch
>
>
> Consider scenario where, StandBy RM is failed to transition to Active because 
> of ZK exception(connectionLoss or SessionExpired). Then any further 
> transition to Active for same RM does not move RM to Active state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2494) [YARN-796] Node label manager API and storage implementations

2014-09-24 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146333#comment-14146333
 ] 

Wangda Tan commented on YARN-2494:
--

[~cwelch],
I replied your comment in YARN-796 here, I think it's an implementation detail 
of FileSystemNodeLabelManager:
bq. It looks like the FileSystemNodeLabelManager will just append changes to 
the edit log forever, until it is restarted, is that correct? If so, a 
long-running cluster with lots of changes could result in a rather large edit 
log. I think every so many writes (N writes) a recovery should be "forced" to 
clean up the edit log and consolidate state (do a recover...)

It a good suggestion, but I think it's more like an enhancement to me. I 
roughly estimate, if we have 10,000 node label changes in one hour, average 
size of the label is 16 (8 for label and 8 for node), if we have the cluster 
running for one year, size of the editlog will be: {{1 * 16 * 24 * 365 / 
1024 / 1024}} MB = 1336 MB, according to existing HDFS read throughput (at 
least we can get 50MB / sec), it should be acceptable to me if restart a RM ran 
for a whole year and cost about 30s extra time.

I agree that periodically create a new mirror and cleanup editlog is better 
than this, we can do it if we have other high priority problems addressed.

Thanks,
Wangda

> [YARN-796] Node label manager API and storage implementations
> -
>
> Key: YARN-2494
> URL: https://issues.apache.org/jira/browse/YARN-2494
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2494.patch, YARN-2494.patch, YARN-2494.patch, 
> YARN-2494.patch, YARN-2494.patch
>
>
> This JIRA includes APIs and storage implementations of node label manager,
> NodeLabelManager is an abstract class used to manage labels of nodes in the 
> cluster, it has APIs to query/modify
> - Nodes according to given label
> - Labels according to given hostname
> - Add/remove labels
> - Set labels of nodes in the cluster
> - Persist/recover changes of labels/labels-on-nodes to/from storage
> And it has two implementations to store modifications
> - Memory based storage: It will not persist changes, so all labels will be 
> lost when RM restart
> - FileSystem based storage: It will persist/recover to/from FileSystem (like 
> HDFS), and all labels and labels-on-nodes will be recovered upon RM restart



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-668) TokenIdentifier serialization should consider Unknown fields

2014-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146343#comment-14146343
 ] 

Hadoop QA commented on YARN-668:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670845/YARN-668-v3.patch
  against trunk revision ef784a2.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 24 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 3 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests:

org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5096//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5096//artifact/PreCommit-HADOOP-Build-patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5096//console

This message is automatically generated.

> TokenIdentifier serialization should consider Unknown fields
> 
>
> Key: YARN-668
> URL: https://issues.apache.org/jira/browse/YARN-668
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Junping Du
>Priority: Blocker
> Attachments: YARN-668-demo.patch, YARN-668-v2.patch, 
> YARN-668-v3.patch, YARN-668.patch
>
>
> This would allow changing of the TokenIdentifier between versions. The 
> current serialization is Writable. A simple way to achieve this would be to 
> have a Proto object as the payload for TokenIdentifiers, instead of 
> individual fields.
> TokenIdentifier continues to implement Writable to work with the RPC layer - 
> but the payload itself is serialized using PB.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2569) Log Handling for LRS API Changes

2014-09-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146350#comment-14146350
 ] 

Hudson commented on YARN-2569:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1881 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1881/])
YARN-2569. Added the log handling APIs for the long running services. 
Contributed by Xuan Gong. (zjshen: rev 5338ac416ab8ab3e7e0a7bfb4a53151fc457f673)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestPBImplRecords.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/LogAggregationContextPBImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationSubmissionContextPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/LogAggregationContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.java


> Log Handling for LRS API Changes
> 
>
> Key: YARN-2569
> URL: https://issues.apache.org/jira/browse/YARN-2569
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.6.0
>
> Attachments: YARN-2569.1.patch, YARN-2569.2.patch, YARN-2569.3.patch, 
> YARN-2569.4.1.patch, YARN-2569.4.patch, YARN-2569.5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2593) Many tests get failed on trunk

2014-09-24 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146363#comment-14146363
 ] 

Junping Du commented on YARN-2593:
--

Thanks [~jianhe] and [~kasha] for quickly reply. Looks like trunk is fine for 
now as I retry the Jenkins test in YARN-668. Will close this JIRA soon.

> Many tests get failed on trunk
> --
>
> Key: YARN-2593
> URL: https://issues.apache.org/jira/browse/YARN-2593
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Priority: Blocker
>
> From YARN-668, we can see many test failures there. I already verified that 
> trunk branch can repro these failures.
> org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
> org.apache.hadoop.yarn.client.TestResourceTrackerOnHA
> org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart
> org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
> org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens
> org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
> org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
> org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA
> org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore
> org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA
> org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup
> org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2593) Many tests get failed on trunk

2014-09-24 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du resolved YARN-2593.
--
Resolution: Fixed

> Many tests get failed on trunk
> --
>
> Key: YARN-2593
> URL: https://issues.apache.org/jira/browse/YARN-2593
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Priority: Blocker
>
> From YARN-668, we can see many test failures there. I already verified that 
> trunk branch can repro these failures.
> org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
> org.apache.hadoop.yarn.client.TestResourceTrackerOnHA
> org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart
> org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
> org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens
> org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
> org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
> org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA
> org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore
> org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA
> org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup
> org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2580) Windows Secure Container Executor: grant job query privileges to the container user

2014-09-24 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu resolved YARN-2580.

Resolution: Implemented

Fix will be contained in next YARN-2198 patch

> Windows Secure Container Executor: grant job query privileges to the 
> container user
> ---
>
> Key: YARN-2580
> URL: https://issues.apache.org/jira/browse/YARN-2580
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: YARN-2580.1.patch
>
>
> mapred.MapTask.iniitalize uses WindowsBasedProcessTree which uses winutils to 
> query the container NT JOB. This must eb granted query permission by the 
> hadoopwinutilsvc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2580) Windows Secure Container Executor: grant job query privileges to the container user

2014-09-24 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated YARN-2580:
---
Attachment: YARN-2580.1.patch

This fixes the problems related to the job and process permission. The job and 
the spawned processes are explictly added ACEs giving access to NM, the 
container user and a set of hard-coded SIDs (LocalYstem, Administrators).

> Windows Secure Container Executor: grant job query privileges to the 
> container user
> ---
>
> Key: YARN-2580
> URL: https://issues.apache.org/jira/browse/YARN-2580
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: YARN-2580.1.patch
>
>
> mapred.MapTask.iniitalize uses WindowsBasedProcessTree which uses winutils to 
> query the container NT JOB. This must eb granted query permission by the 
> hadoopwinutilsvc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2590) Windows Secure Container Executor: containerLaunch environment does not get transferred to the container process

2014-09-24 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated YARN-2590:
---
Summary: Windows Secure Container Executor: containerLaunch environment 
does not get transferred to the container process  (was: Windows Secure 
Container Executor: containerLaunch environment doe snot get transferred to the 
container process)

> Windows Secure Container Executor: containerLaunch environment does not get 
> transferred to the container process
> 
>
> Key: YARN-2590
> URL: https://issues.apache.org/jira/browse/YARN-2590
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>  Labels: security, windows
>
> the sanitized env prepared by the container launch is ignored by the WSCE 
> launcher. The env has to be passwed in the createTaskAsUser call to 
> hadoopwinutilsvc so that is assigns it to the newly spawned process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2594) ResourceManger sometimes become un-responsive

2014-09-24 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146381#comment-14146381
 ] 

Wangda Tan commented on YARN-2594:
--

Hi [~devaraj.k],
Have you already looked into that? I think I've found the root cause of this 
problem already, could you assign this ticket to me?

This is a deadlock between the two pairs:
{code}
"IPC Server handler 45 on 8032" daemon prio=10 tid=0x7f032909b000 
nid=0x7bd7 waiting for monitor entry [0x7f0307aa9000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.getResourceUsageReport(SchedulerApplicationAttempt.java:541)
- waiting to lock <0xe0e7ea70> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.getAppResourceUsageReport(AbstractYarnScheduler.java:196)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getApplicationResourceUsageReport(RMAppAttemptImpl.java:703)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:569)
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:294)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
{code}

And 

{code}
"ResourceManager Event Processor" prio=10 tid=0x7f0328db9800 nid=0x7aeb 
waiting on condition [0x7f0311a48000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0xe0e72bc0> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
at 
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.getCurrentAppAttempt(RMAppImpl.java:476)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$FinishedTransition.updateAttemptMetrics(RMContainerImpl.java:509)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$FinishedTransition.transition(RMContainerImpl.java:495)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$FinishedTransition.transition(RMContainerImpl.java:484)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
- locked <0xe0e85318> (a 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:373)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:58)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.containerCompleted(FiCaSchedulerApp.java:89)
- locked <0xe0e7ea70> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacit

[jira] [Commented] (YARN-2592) Preemption can kill containers to fulfil need of already over-capacity queue.

2014-09-24 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146391#comment-14146391
 ] 

Jason Lowe commented on YARN-2592:
--

+1 for at least allowing users to configure no preemption to satisfy over 
capacity queues.

> Preemption can kill containers to fulfil need of already over-capacity queue.
> -
>
> Key: YARN-2592
> URL: https://issues.apache.org/jira/browse/YARN-2592
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.5.1
>Reporter: Eric Payne
>
> There are scenarios in which one over-capacity queue can cause preemption of 
> another over-capacity queue. However, since killing containers may lose work, 
> it doesn't make sense to me to kill containers to feed an already 
> over-capacity queue.
> Consider the following:
> {code}
> root has A,B,C, total capacity = 90
> A.guaranteed = 30, A.pending = 5, A.current = 40
> B.guaranteed = 30, B.pending = 0, B.current = 50
> C.guaranteed = 30, C.pending = 0, C.current = 0
> {code}
> In this case, the queue preemption monitor will kill 5 resources from queue B 
> so that queue A can pick them up, even though queue A is already over its 
> capacity. This could lose any work that those containers in B had already 
> done.
> Is there a use case for this behavior? It seems to me that if a queue is 
> already over its capacity, it shouldn't destroy the work of other queues. If 
> the over-capacity queue needs more resources, that seems to be a problem that 
> should be solved by increasing its guarantee.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2588) Standby RM does not transitionToActive if previous transitionToActive is failed with ZK exception.

2014-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146390#comment-14146390
 ] 

Hadoop QA commented on YARN-2588:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670964/YARN-2588.patch
  against trunk revision ef784a2.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5097//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5097//console

This message is automatically generated.

> Standby RM does not transitionToActive if previous transitionToActive is 
> failed with ZK exception.
> --
>
> Key: YARN-2588
> URL: https://issues.apache.org/jira/browse/YARN-2588
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0, 2.6.0, 2.5.1
>Reporter: Rohith
>Assignee: Rohith
> Attachments: YARN-2588.patch
>
>
> Consider scenario where, StandBy RM is failed to transition to Active because 
> of ZK exception(connectionLoss or SessionExpired). Then any further 
> transition to Active for same RM does not move RM to Active state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2596) TestWorkPreservingRMRestart failed on trunk

2014-09-24 Thread Junping Du (JIRA)
Junping Du created YARN-2596:


 Summary: TestWorkPreservingRMRestart failed on trunk
 Key: YARN-2596
 URL: https://issues.apache.org/jira/browse/YARN-2596
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Junping Du


As test result from YARN-668, the test failure can be reproduce locally without 
apply new patch to trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2593) Many tests get failed on trunk

2014-09-24 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146403#comment-14146403
 ] 

Junping Du commented on YARN-2593:
--

In addition, still left a test failure on TestWorkPreservingRMRestart. Filed 
YARN-2596 to track this.

> Many tests get failed on trunk
> --
>
> Key: YARN-2593
> URL: https://issues.apache.org/jira/browse/YARN-2593
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Priority: Blocker
>
> From YARN-668, we can see many test failures there. I already verified that 
> trunk branch can repro these failures.
> org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
> org.apache.hadoop.yarn.client.TestResourceTrackerOnHA
> org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart
> org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
> org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens
> org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
> org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
> org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA
> org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore
> org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA
> org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup
> org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-668) TokenIdentifier serialization should consider Unknown fields

2014-09-24 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-668:

Attachment: YARN-668-v4.patch

Fix findbug issue in v4 patch.

> TokenIdentifier serialization should consider Unknown fields
> 
>
> Key: YARN-668
> URL: https://issues.apache.org/jira/browse/YARN-668
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Junping Du
>Priority: Blocker
> Attachments: YARN-668-demo.patch, YARN-668-v2.patch, 
> YARN-668-v3.patch, YARN-668-v4.patch, YARN-668.patch
>
>
> This would allow changing of the TokenIdentifier between versions. The 
> current serialization is Writable. A simple way to achieve this would be to 
> have a Proto object as the payload for TokenIdentifiers, instead of 
> individual fields.
> TokenIdentifier continues to implement Writable to work with the RPC layer - 
> but the payload itself is serialized using PB.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2594) ResourceManger sometimes become un-responsive

2014-09-24 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K reassigned YARN-2594:
---

Assignee: Wangda Tan  (was: Devaraj K)

[~leftnoteasy] Thanks for your effort. I am assigning to you. 

> ResourceManger sometimes become un-responsive
> -
>
> Key: YARN-2594
> URL: https://issues.apache.org/jira/browse/YARN-2594
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Karam Singh
>Assignee: Wangda Tan
>
> ResoruceManager sometimes become un-responsive:
> There was in exception in ResourceManager log and contains only  following 
> type of messages:
> {code}
> 2014-09-19 19:13:45,241 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 53000
> 2014-09-19 19:30:26,312 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 54000
> 2014-09-19 19:47:07,351 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 55000
> 2014-09-19 20:03:48,460 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 56000
> 2014-09-19 20:20:29,542 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 57000
> 2014-09-19 20:37:10,635 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 58000
> 2014-09-19 20:53:51,722 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 59000
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2312) Marking ContainerId#getId as deprecated

2014-09-24 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2312:
-
Attachment: YARN-2312.2.patch

Thanks for your review, Jason.

{quote}
What I meant by my original comment was to not have JvmID derive from ID at all.
{quote}

I agree that it's better way. Attached a updated patch.

> Marking ContainerId#getId as deprecated
> ---
>
> Key: YARN-2312
> URL: https://issues.apache.org/jira/browse/YARN-2312
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2312-wip.patch, YARN-2312.1.patch, YARN-2312.2.patch
>
>
> {{ContainerId#getId}} will only return partial value of containerId, only 
> sequence number of container id without epoch, after YARN-2229. We should 
> mark {{ContainerId#getId}} as deprecated and use 
> {{ContainerId#getContainerId}} instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2562) ContainerId@toString() is unreadable for epoch >0 after YARN-2182

2014-09-24 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2562:
-
Attachment: YARN-2562.2.patch

Updated a patch:

* Changed the format to have epoch like 
container_1410901177871_0001_01_05_e17.
* Added a comment to ContainerId#toString.

> ContainerId@toString() is unreadable for epoch >0 after YARN-2182
> -
>
> Key: YARN-2562
> URL: https://issues.apache.org/jira/browse/YARN-2562
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-2562.1.patch, YARN-2562.2.patch
>
>
> ContainerID string format is unreadable for RMs that restarted at least once 
> (epoch > 0) after YARN-2182. For e.g, 
> container_1410901177871_0001_01_05_17.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (YARN-2593) Many tests get failed on trunk

2014-09-24 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla reopened YARN-2593:


> Many tests get failed on trunk
> --
>
> Key: YARN-2593
> URL: https://issues.apache.org/jira/browse/YARN-2593
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Priority: Blocker
>
> From YARN-668, we can see many test failures there. I already verified that 
> trunk branch can repro these failures.
> org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
> org.apache.hadoop.yarn.client.TestResourceTrackerOnHA
> org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart
> org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
> org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens
> org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
> org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
> org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA
> org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore
> org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA
> org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup
> org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2593) Many tests get failed on trunk

2014-09-24 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla resolved YARN-2593.

Resolution: Not a Problem

> Many tests get failed on trunk
> --
>
> Key: YARN-2593
> URL: https://issues.apache.org/jira/browse/YARN-2593
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Priority: Blocker
>
> From YARN-668, we can see many test failures there. I already verified that 
> trunk branch can repro these failures.
> org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
> org.apache.hadoop.yarn.client.TestResourceTrackerOnHA
> org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart
> org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
> org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens
> org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
> org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
> org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA
> org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore
> org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA
> org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup
> org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2562) ContainerId@toString() is unreadable for epoch >0 after YARN-2182

2014-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146490#comment-14146490
 ] 

Hadoop QA commented on YARN-2562:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670977/YARN-2562.2.patch
  against trunk revision 034df0e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5099//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5099//console

This message is automatically generated.

> ContainerId@toString() is unreadable for epoch >0 after YARN-2182
> -
>
> Key: YARN-2562
> URL: https://issues.apache.org/jira/browse/YARN-2562
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-2562.1.patch, YARN-2562.2.patch
>
>
> ContainerID string format is unreadable for RMs that restarted at least once 
> (epoch > 0) after YARN-2182. For e.g, 
> container_1410901177871_0001_01_05_17.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2569) Log Handling for LRS API Changes

2014-09-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146511#comment-14146511
 ] 

Hudson commented on YARN-2569:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1906 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1906/])
YARN-2569. Added the log handling APIs for the long running services. 
Contributed by Xuan Gong. (zjshen: rev 5338ac416ab8ab3e7e0a7bfb4a53151fc457f673)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestPBImplRecords.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/LogAggregationContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/LogAggregationContextPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationSubmissionContextPBImpl.java


> Log Handling for LRS API Changes
> 
>
> Key: YARN-2569
> URL: https://issues.apache.org/jira/browse/YARN-2569
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.6.0
>
> Attachments: YARN-2569.1.patch, YARN-2569.2.patch, YARN-2569.3.patch, 
> YARN-2569.4.1.patch, YARN-2569.4.patch, YARN-2569.5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-668) TokenIdentifier serialization should consider Unknown fields

2014-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146518#comment-14146518
 ] 

Hadoop QA commented on YARN-668:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670974/YARN-668-v4.patch
  against trunk revision ef784a2.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 24 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests:

org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5098//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5098//console

This message is automatically generated.

> TokenIdentifier serialization should consider Unknown fields
> 
>
> Key: YARN-668
> URL: https://issues.apache.org/jira/browse/YARN-668
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Junping Du
>Priority: Blocker
> Attachments: YARN-668-demo.patch, YARN-668-v2.patch, 
> YARN-668-v3.patch, YARN-668-v4.patch, YARN-668.patch
>
>
> This would allow changing of the TokenIdentifier between versions. The 
> current serialization is Writable. A simple way to achieve this would be to 
> have a Proto object as the payload for TokenIdentifiers, instead of 
> individual fields.
> TokenIdentifier continues to implement Writable to work with the RPC layer - 
> but the payload itself is serialized using PB.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN

2014-09-24 Thread Abin Shahab (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146531#comment-14146531
 ] 

Abin Shahab commented on YARN-1964:
---

Pull-request on git: https://github.com/apache/hadoop/pull/2(comments are 
welcome).

> Create Docker analog of the LinuxContainerExecutor in YARN
> --
>
> Key: YARN-1964
> URL: https://issues.apache.org/jira/browse/YARN-1964
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.2.0
>Reporter: Arun C Murthy
>Assignee: Abin Shahab
> Attachments: yarn-1964-branch-2.2.0-docker.patch, 
> yarn-1964-branch-2.2.0-docker.patch, yarn-1964-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, 
> yarn-1964-docker.patch
>
>
> Docker (https://www.docker.io/) is, increasingly, a very popular container 
> technology.
> In context of YARN, the support for Docker will provide a very elegant 
> solution to allow applications to *package* their software into a Docker 
> container (entire Linux file system incl. custom versions of perl, python 
> etc.) and use it as a blueprint to launch all their YARN containers with 
> requisite software environment. This provides both consistency (all YARN 
> containers will have the same software environment) and isolation (no 
> interference with whatever is installed on the physical machine).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2596) TestWorkPreservingRMRestart failed on trunk

2014-09-24 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146538#comment-14146538
 ] 

Jian He commented on YARN-2596:
---

The test is only failing for Fair Scheduler. looks like some latest check-in 
broke the test. [~kasha], would you mind taking a look ? thx.
{code}
java.lang.NullPointerException  
  at org.apache.hadoop.yarn.util.resource.Resources.addTo(Resources.java:121)   
  at org.apache.hadoop.yarn.util.resource.Resources.add(Resources.java:127) 
  at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.canRunAppAM(FSLeafQueue.java:316)
  at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:563)
  at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:769)
  at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:215)
  at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173)
  at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1053)
  at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:964)
  at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1131)
  at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1)
  at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:612)
  at java.lang.Thread.run(Thread.java:695) 
{code}

> TestWorkPreservingRMRestart failed on trunk
> ---
>
> Key: YARN-2596
> URL: https://issues.apache.org/jira/browse/YARN-2596
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Junping Du
>
> As test result from YARN-668, the test failure can be reproduce locally 
> without apply new patch to trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2596) TestWorkPreservingRMRestart for FairScheduler failed on trunk

2014-09-24 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2596:
--
Summary: TestWorkPreservingRMRestart for FairScheduler failed on trunk  
(was: TestWorkPreservingRMRestart failed on trunk)

> TestWorkPreservingRMRestart for FairScheduler failed on trunk
> -
>
> Key: YARN-2596
> URL: https://issues.apache.org/jira/browse/YARN-2596
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Junping Du
>
> As test result from YARN-668, the test failure can be reproduce locally 
> without apply new patch to trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2102) More generalized timeline ACLs

2014-09-24 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146603#comment-14146603
 ] 

Vinod Kumar Vavilapalli commented on YARN-2102:
---

Looks good, +1. Checking this in..

> More generalized timeline ACLs
> --
>
> Key: YARN-2102
> URL: https://issues.apache.org/jira/browse/YARN-2102
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: GeneralizedTimelineACLs.pdf, YARN-2102.1.patch, 
> YARN-2102.2.patch, YARN-2102.3.patch, YARN-2102.5.patch, YARN-2102.6.patch, 
> YARN-2102.7.patch, YARN-2102.8.patch
>
>
> We need to differentiate the access controls of reading and writing 
> operations, and we need to think about cross-entity access control. For 
> example, if we are executing a workflow of MR jobs, which writing the 
> timeline data of this workflow, we don't want other user to pollute the 
> timeline data of the workflow by putting something under it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-09-24 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1879:
-
Attachment: YARN-1879.14.patch

Refresh a patch based on trunk code.

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
> ---
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
> YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
> YARN-1879.14.patch, YARN-1879.2-wip.patch, YARN-1879.2.patch, 
> YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, YARN-1879.6.patch, 
> YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2562) ContainerId@toString() is unreadable for epoch >0 after YARN-2182

2014-09-24 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146645#comment-14146645
 ] 

Tsuyoshi OZAWA commented on YARN-2562:
--

[~jianhe] [~vinodkv] it's ready for review. Could you take a look, please?

> ContainerId@toString() is unreadable for epoch >0 after YARN-2182
> -
>
> Key: YARN-2562
> URL: https://issues.apache.org/jira/browse/YARN-2562
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-2562.1.patch, YARN-2562.2.patch
>
>
> ContainerID string format is unreadable for RMs that restarted at least once 
> (epoch > 0) after YARN-2182. For e.g, 
> container_1410901177871_0001_01_05_17.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2284) Find missing config options in YarnConfiguration and yarn-default.xml

2014-09-24 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-2284:
-
Attachment: YARN-2284-08.patch

Upload before I clobber my own changes again.

Add framework to throw errors if differences are found.  Turned off by default, 
but should be turned on once the XML/Config files match up cleanly.

> Find missing config options in YarnConfiguration and yarn-default.xml
> -
>
> Key: YARN-2284
> URL: https://issues.apache.org/jira/browse/YARN-2284
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.4.1
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Minor
>  Labels: supportability
> Attachments: YARN-2284-04.patch, YARN-2284-05.patch, 
> YARN-2284-06.patch, YARN-2284-07.patch, YARN-2284-08.patch, 
> YARN2284-01.patch, YARN2284-02.patch, YARN2284-03.patch
>
>
> YarnConfiguration has one set of properties.  yarn-default.xml has another 
> set of properties.  Ideally, there should be an automatic way to find missing 
> properties in either location.
> This is analogous to MAPREDUCE-5130, but for yarn-default.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2581) NMs need to find a way to get LogAggregationContext

2014-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146663#comment-14146663
 ] 

Hadoop QA commented on YARN-2581:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670817/YARN-2581.3.patch
  against trunk revision 073bbd8.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5101//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5101//console

This message is automatically generated.

> NMs need to find a way to get LogAggregationContext
> ---
>
> Key: YARN-2581
> URL: https://issues.apache.org/jira/browse/YARN-2581
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2581.1.patch, YARN-2581.2.patch, YARN-2581.3.patch
>
>
> After YARN-2569, we have LogAggregationContext for application in 
> ApplicationSubmissionContext. NMs need to find a way to get this information.
> We have this requirement: For all containers in the same application should 
> honor the same LogAggregationContext.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2581) NMs need to find a way to get LogAggregationContext

2014-09-24 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146693#comment-14146693
 ] 

Xuan Gong commented on YARN-2581:
-

testcase failure is not related, and is tracked by 
https://issues.apache.org/jira/browse/YARN-2596

> NMs need to find a way to get LogAggregationContext
> ---
>
> Key: YARN-2581
> URL: https://issues.apache.org/jira/browse/YARN-2581
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2581.1.patch, YARN-2581.2.patch, YARN-2581.3.patch
>
>
> After YARN-2569, we have LogAggregationContext for application in 
> ApplicationSubmissionContext. NMs need to find a way to get this information.
> We have this requirement: For all containers in the same application should 
> honor the same LogAggregationContext.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-09-24 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-913:

Attachment: YARN-913-009.patch

Revised patch

A key change is * support for kerberos digest (id:pass) ACLs in paths under 
user accounts. A user may request that all nodes/records created in a session 
add specific id:pass access to the nodes; these are translated into 
full-access-ACL entries. 

Why do this? It allows a user to give a long-running service the ability to 
manipulate part of the service registry —i.e. its own record and below— without 
having to worry about token expiry.  It's not mandatory for long running 
services to do this; they can go the credential route if they want...this just 
makes it an option. In particular, it allows containers to add records without 
needing any credentials, even if the AM bootstraps the registration from one 
supplied at launch time.

* factory to create kerberos, anonymous and user:pass accessors to the registry
* RM automatically creates user dir with write access for the the user on app 
submission.
* Security model tested in *much* more depth. Specifically, all those different 
levels of access are tested to make sure that extra rights are not being 
granted.
* Helper methods to aid working with the registry in clients and 
AMs/containers. See 
{{org.apache.hadoop.yarn.registry.client.binding.RegistryOperationUtils}}


> Add a way to register long-lived services in a YARN cluster
> ---
>
> Key: YARN-913
> URL: https://issues.apache.org/jira/browse/YARN-913
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Affects Versions: 2.5.0, 2.4.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
> 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
> YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
> YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
> YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, yarnregistry.pdf, 
> yarnregistry.tla
>
>
> In a YARN cluster you can't predict where services will come up -or on what 
> ports. The services need to work those things out as they come up and then 
> publish them somewhere.
> Applications need to be able to find the service instance they are to bond to 
> -and not any others in the cluster.
> Some kind of service registry -in the RM, in ZK, could do this. If the RM 
> held the write access to the ZK nodes, it would be more secure than having 
> apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2465) Make YARN unit tests work when pseudo distributed cluster is running

2014-09-24 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146722#comment-14146722
 ] 

Steve Loughran commented on YARN-2465:
--

Application history server comes up hard coded onto port 10200 too, which fails 
if something else is running there.

> Make YARN unit tests work when pseudo distributed cluster is running
> 
>
> Key: YARN-2465
> URL: https://issues.apache.org/jira/browse/YARN-2465
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: YARN-2465.patch
>
>
> This is useful for development where you might have some pseudo distributed 
> cluster in the background and don't want to stop it to run unit test cases. 
> Most YARN test cases pass, except some tests that use localization service 
> try to bind to the default localization service port.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2597) MiniYARNCluster doesn't propagate reason for AHS not starting

2014-09-24 Thread Steve Loughran (JIRA)
Steve Loughran created YARN-2597:


 Summary: MiniYARNCluster doesn't propagate reason for AHS not 
starting
 Key: YARN-2597
 URL: https://issues.apache.org/jira/browse/YARN-2597
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0
Reporter: Steve Loughran


If the AHS doesn't come up, your test run gets an exception telling you this 
fact -but the underlying cause is not propagated.

As YARN services do record their failure cause, extracting and propagating this 
is trivial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2597) MiniYARNCluster doesn't propagate reason for AHS not starting

2014-09-24 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146726#comment-14146726
 ] 

Steve Loughran commented on YARN-2597:
--

Without the patch

{code}
testContainerLaunchFailureHandling(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
  Time elapsed: 4.209 sec  <<< ERROR!
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: 
ApplicationHistoryServer failed to start. Final state is STOPPED
at 
org.apache.hadoop.yarn.server.MiniYARNCluster$ApplicationHistoryServerWrapper.serviceStart(MiniYARNCluster.java:736)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.setup(TestDistributedShell.java:92)

{code}

With the patch
{code}
{code}


testDSShellWithMultipleArgs(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
  Time elapsed: 4.323 sec  <<< ERROR!
org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
ApplicationHistoryServer failed to start. Final state is STOPPED
at 
org.apache.hadoop.yarn.server.MiniYARNCluster$ApplicationHistoryServerWrapper.serviceStart(MiniYARNCluster.java:737)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.setup(TestDistributedShell.java:92)
Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
java.net.BindException: Problem binding to [0.0.0.0:10200] 
java.net.BindException: Address already in use; For more details see:  
http://wiki.apache.org/hadoop/BindException
at 
org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139)
at 
org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65)
at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54)
at 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.serviceStart(ApplicationHistoryClientService.java:87)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
at 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceStart(ApplicationHistoryServer.java:109)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.MiniYARNCluster$ApplicationHistoryServerWrapper$1.run(MiniYARNCluster.java:726)
Caused by: java.net.BindException: Problem binding to [0.0.0.0:10200] 
java.net.BindException: Address already in use; For more details see:  
http://wiki.apache.org/hadoop/BindException
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:719)
at org.apache.hadoop.ipc.Server.bind(Server.java:427)
at org.apache.hadoop.ipc.Server$Listener.(Server.java:576)
at org.apache.hadoop.ipc.Server.(Server.java:2291)
at org.apache.hadoop.ipc.RPC$Server.(RPC.java:935)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:537)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:512)
at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:780)
at 
org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.createServer(RpcServerFactoryPBImpl.java:169)
at 
org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:132)
at 
org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65)
at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54)
at 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.serviceStart(ApplicationHistoryClientService.java:87)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
at 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceStart(ApplicationHistoryServer.java:109)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.MiniYARNCluster$ApplicationHistoryServerWrapper$1.run(MiniYARNCluster.java:726)

{code}


> Mi

[jira] [Updated] (YARN-2597) MiniYARNCluster doesn't propagate reason for AHS not starting

2014-09-24 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-2597:
-
Attachment: YARN-2597-001.patch

propagates failure cause on AHS startup failure

> MiniYARNCluster doesn't propagate reason for AHS not starting
> -
>
> Key: YARN-2597
> URL: https://issues.apache.org/jira/browse/YARN-2597
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
> Attachments: YARN-2597-001.patch
>
>
> If the AHS doesn't come up, your test run gets an exception telling you this 
> fact -but the underlying cause is not propagated.
> As YARN services do record their failure cause, extracting and propagating 
> this is trivial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2592) Preemption can kill containers to fulfil need of already over-capacity queue.

2014-09-24 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146728#comment-14146728
 ] 

Carlo Curino commented on YARN-2592:


Preemption is trying to enforce the scheduler invariants. One of which is how 
over-capacity is distributed among queues (weighted fairly on rightful 
capacity).  

I understand the desire to "protect" individual containers, and there will be 
many specific examples we can come up with in which killing a container is a 
pity as it loses some work (unless it handles the preemption message correctly 
and checkpoint its state), but long term I think enforcing the invariants is 
more important (fair and predictable for users). The opposite argument one can 
make is "why is queue B allowed to retain more over capacity than A?" if this 
happens systematically or for long period of time is unnerving for users as 
much as some lost work. 

Also note that preemption already has few built-in mechanisms (deadzones, and 
grace-periods) designed to limit the impact on running tasks, are we sure that 
proper tuning of capacity/max-capcity/dead-zones/grace-periods is not enough to 
remove 99% of the problem? This would be only an issue for long-running tasks 
(exceeding 2x the grace periods), when run above the capacity + dead-zone of a 
queue but within max-capacity. And only trigger, for a queue that is more over 
capacity than any other peer queue, when the peer queue also has over-capacity 
needs exceeding free space, AND no under-capacity queue is demanding the same 
resources we should make sure this is significant enough of a scenario in 
practice to justify complexity of new configurables.

I am definitely opposed to make this the default behavior, but I agree with 
Jason that we could add config parameters that allow to prevent preemption for 
over-capacity balancing. I feel though this is a slippery slope, which I think 
might lead to many loopholes (protecting AM being another one), that eventually 
will make configuring preemption and understanding what is happening for the 
users very hard. 

I think promoting proper handling of preemption on the app side (i.e., 
checkpoint your state, or redistributed your computation) is overall a 
healthier direction. 

My 2 cents..


> Preemption can kill containers to fulfil need of already over-capacity queue.
> -
>
> Key: YARN-2592
> URL: https://issues.apache.org/jira/browse/YARN-2592
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.5.1
>Reporter: Eric Payne
>
> There are scenarios in which one over-capacity queue can cause preemption of 
> another over-capacity queue. However, since killing containers may lose work, 
> it doesn't make sense to me to kill containers to feed an already 
> over-capacity queue.
> Consider the following:
> {code}
> root has A,B,C, total capacity = 90
> A.guaranteed = 30, A.pending = 5, A.current = 40
> B.guaranteed = 30, B.pending = 0, B.current = 50
> C.guaranteed = 30, C.pending = 0, C.current = 0
> {code}
> In this case, the queue preemption monitor will kill 5 resources from queue B 
> so that queue A can pick them up, even though queue A is already over its 
> capacity. This could lose any work that those containers in B had already 
> done.
> Is there a use case for this behavior? It seems to me that if a queue is 
> already over its capacity, it shouldn't destroy the work of other queues. If 
> the over-capacity queue needs more resources, that seems to be a problem that 
> should be solved by increasing its guarantee.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2312) Marking ContainerId#getId as deprecated

2014-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146739#comment-14146739
 ] 

Hadoop QA commented on YARN-2312:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670976/YARN-2312.2.patch
  against trunk revision 034df0e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 16 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.mapreduce.lib.input.TestMRCJCFileInputFormat
  org.apache.hadoop.mapred.TestJavaSerialization

  The following test timeouts occurred in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5100//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5100//artifact/PreCommit-HADOOP-Build-patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5100//console

This message is automatically generated.

> Marking ContainerId#getId as deprecated
> ---
>
> Key: YARN-2312
> URL: https://issues.apache.org/jira/browse/YARN-2312
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2312-wip.patch, YARN-2312.1.patch, YARN-2312.2.patch
>
>
> {{ContainerId#getId}} will only return partial value of containerId, only 
> sequence number of container id without epoch, after YARN-2229. We should 
> mark {{ContainerId#getId}} as deprecated and use 
> {{ContainerId#getContainerId}} instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2581) NMs need to find a way to get LogAggregationContext

2014-09-24 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146742#comment-14146742
 ] 

Zhijie Shen commented on YARN-2581:
---

It looks good to me overall. Just some nits. 

1. Can't we let this.logAggregationContext = logContextData?
{code}
+  LogAggregationContextPBImpl logContextData =
+  new LogAggregationContextPBImpl(
+LogAggregationContextProto.parseFrom(bytes));
+  this.logAggregationContext =
+  LogAggregationContext.newInstance(logContextData.getIncludePattern(),
+logContextData.getExcludePattern(),
+logContextData.getRollingIntervalSeconds());
{code}

2. Can the following
{code}
+Assert.assertEquals(returned.getIncludePattern(), "includePattern");
+Assert.assertEquals(returned.getExcludePattern(), "excludePattern");
+Assert.assertTrue(returned.getRollingIntervalSeconds() == interval);
{code}
be changed to
{code}
+Assert.assertEquals("includePattern", returned.getIncludePattern());
+Assert.assertEquals("excludePattern", returned.getExcludePattern());
+Assert.assertEquals(interval, returned.getRollingIntervalSeconds());
{code}
Though it makes no difference when assertion passes, it could show accurate 
error when assertion fails.

[~vinodkv], do you want to have a look at it too?

> NMs need to find a way to get LogAggregationContext
> ---
>
> Key: YARN-2581
> URL: https://issues.apache.org/jira/browse/YARN-2581
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2581.1.patch, YARN-2581.2.patch, YARN-2581.3.patch
>
>
> After YARN-2569, we have LogAggregationContext for application in 
> ApplicationSubmissionContext. NMs need to find a way to get this information.
> We have this requirement: For all containers in the same application should 
> honor the same LogAggregationContext.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2596) TestWorkPreservingRMRestart for FairScheduler failed on trunk

2014-09-24 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla reassigned YARN-2596:
--

Assignee: Karthik Kambatla

> TestWorkPreservingRMRestart for FairScheduler failed on trunk
> -
>
> Key: YARN-2596
> URL: https://issues.apache.org/jira/browse/YARN-2596
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Junping Du
>Assignee: Karthik Kambatla
>
> As test result from YARN-668, the test failure can be reproduce locally 
> without apply new patch to trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2596) TestWorkPreservingRMRestart for FairScheduler failed on trunk

2014-09-24 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146747#comment-14146747
 ] 

Karthik Kambatla commented on YARN-2596:


Looking into it. 

> TestWorkPreservingRMRestart for FairScheduler failed on trunk
> -
>
> Key: YARN-2596
> URL: https://issues.apache.org/jira/browse/YARN-2596
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Junping Du
>Assignee: Karthik Kambatla
>
> As test result from YARN-668, the test failure can be reproduce locally 
> without apply new patch to trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2284) Find missing config options in YarnConfiguration and yarn-default.xml

2014-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146756#comment-14146756
 ] 

Hadoop QA commented on YARN-2284:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12671007/YARN-2284-08.patch
  against trunk revision 073bbd8.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5103//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5103//artifact/PreCommit-HADOOP-Build-patchprocess/newPatchFindbugsWarningshadoop-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5103//console

This message is automatically generated.

> Find missing config options in YarnConfiguration and yarn-default.xml
> -
>
> Key: YARN-2284
> URL: https://issues.apache.org/jira/browse/YARN-2284
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.4.1
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Minor
>  Labels: supportability
> Attachments: YARN-2284-04.patch, YARN-2284-05.patch, 
> YARN-2284-06.patch, YARN-2284-07.patch, YARN-2284-08.patch, 
> YARN2284-01.patch, YARN2284-02.patch, YARN2284-03.patch
>
>
> YarnConfiguration has one set of properties.  yarn-default.xml has another 
> set of properties.  Ideally, there should be an automatic way to find missing 
> properties in either location.
> This is analogous to MAPREDUCE-5130, but for yarn-default.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2592) Preemption can kill containers to fulfil need of already over-capacity queue.

2014-09-24 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146757#comment-14146757
 ] 

Jason Lowe commented on YARN-2592:
--

IMHO users shouldn't be complaining if they are getting their guarantees (i.e.: 
the capacity of the queue).  Anything over capacity is "bonus" and they 
shouldn't rely on the scheduler going out of its way to give it more.  If they 
can't get their stuff done within their configured capacity then they need more 
capacity.

bq. I think promoting proper handling of preemption on the app side (i.e., 
checkpoint your state, or redistributed your computation) is overall a 
healthier direction. 

I agree with the theory.  If preempting is "cheap" then we should leverage it 
more often to solve resource contention.  The problem in practice is that it's 
often outside the hands of ops and even the users.  YARN is becoming more and 
more general, including app frameworks that aren't part of the core Hadoop 
stack, and I think it will be commonplace for quite some time that at least 
some apps won't have checkpoint/migration support.  That makes preemption 
not-so-cheap, which means we don't want to use it unless really necessary.  
Killing containers to give another queue more "bonus" resources is unnecessary 
and therefore preferable to avoid when preemption isn't cheap.  If those 
resources really are necessary then the queue should have more guaranteed 
capacity rather than expect the scheduler to kill other containers when it's 
beyond capacity.

> Preemption can kill containers to fulfil need of already over-capacity queue.
> -
>
> Key: YARN-2592
> URL: https://issues.apache.org/jira/browse/YARN-2592
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.5.1
>Reporter: Eric Payne
>
> There are scenarios in which one over-capacity queue can cause preemption of 
> another over-capacity queue. However, since killing containers may lose work, 
> it doesn't make sense to me to kill containers to feed an already 
> over-capacity queue.
> Consider the following:
> {code}
> root has A,B,C, total capacity = 90
> A.guaranteed = 30, A.pending = 5, A.current = 40
> B.guaranteed = 30, B.pending = 0, B.current = 50
> C.guaranteed = 30, C.pending = 0, C.current = 0
> {code}
> In this case, the queue preemption monitor will kill 5 resources from queue B 
> so that queue A can pick them up, even though queue A is already over its 
> capacity. This could lose any work that those containers in B had already 
> done.
> Is there a use case for this behavior? It seems to me that if a queue is 
> already over its capacity, it shouldn't destroy the work of other queues. If 
> the over-capacity queue needs more resources, that seems to be a problem that 
> should be solved by increasing its guarantee.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146776#comment-14146776
 ] 

Hadoop QA commented on YARN-1879:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12671002/YARN-1879.14.patch
  against trunk revision 073bbd8.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5102//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5102//console

This message is automatically generated.

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
> ---
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
> YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
> YARN-1879.14.patch, YARN-1879.2-wip.patch, YARN-1879.2.patch, 
> YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, YARN-1879.6.patch, 
> YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2592) Preemption can kill containers to fulfil need of already over-capacity queue.

2014-09-24 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146790#comment-14146790
 ] 

Carlo Curino commented on YARN-2592:


I hear you, and I agree we will need to cope with non-cheap preemption for a 
while, and even long term not everyone will be nicely preemptable (our work on 
YARN-1051 is for example designed to allow people to get very guaranteed and 
protected resources when needed). 

However, the compromise you propose means that the over-capacity "zone" is 
weirdly policed... on one side we expect the "giving" of containers to respect 
a notion of fairness (proportional to your rightful capacity), which is in 
turns not enforce by preemption. I find this inconsistent.

Moreover, as I was saying, I think this will only spare containers in a rather 
narrow band (when imbalance happened among over capacity queues, and no 
under-capacity queues are requesting resources yet, and we are above the 
dead-zone, and tasks run longer than 2x the grace period). Is this a large 
enough use case to require special-casing?
If this is important in practice and an adoption show-stopper I am fine with 
compromises, but we should make sure this is the case. 

A way to do this is to enable preemption but run it in "observe-only" mode, 
where the policy logs what he would like to do without actually doing it... We 
can see whether on a real cluster we are often/ever in the scenario you are 
trying to address.




> Preemption can kill containers to fulfil need of already over-capacity queue.
> -
>
> Key: YARN-2592
> URL: https://issues.apache.org/jira/browse/YARN-2592
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.5.1
>Reporter: Eric Payne
>
> There are scenarios in which one over-capacity queue can cause preemption of 
> another over-capacity queue. However, since killing containers may lose work, 
> it doesn't make sense to me to kill containers to feed an already 
> over-capacity queue.
> Consider the following:
> {code}
> root has A,B,C, total capacity = 90
> A.guaranteed = 30, A.pending = 5, A.current = 40
> B.guaranteed = 30, B.pending = 0, B.current = 50
> C.guaranteed = 30, C.pending = 0, C.current = 0
> {code}
> In this case, the queue preemption monitor will kill 5 resources from queue B 
> so that queue A can pick them up, even though queue A is already over its 
> capacity. This could lose any work that those containers in B had already 
> done.
> Is there a use case for this behavior? It seems to me that if a queue is 
> already over its capacity, it shouldn't destroy the work of other queues. If 
> the over-capacity queue needs more resources, that seems to be a problem that 
> should be solved by increasing its guarantee.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2581) NMs need to find a way to get LogAggregationContext

2014-09-24 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2581:

Attachment: YARN-2581.4.patch

Thanks for the review. Fix all the comments

> NMs need to find a way to get LogAggregationContext
> ---
>
> Key: YARN-2581
> URL: https://issues.apache.org/jira/browse/YARN-2581
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2581.1.patch, YARN-2581.2.patch, YARN-2581.3.patch, 
> YARN-2581.4.patch
>
>
> After YARN-2569, we have LogAggregationContext for application in 
> ApplicationSubmissionContext. NMs need to find a way to get this information.
> We have this requirement: For all containers in the same application should 
> honor the same LogAggregationContext.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2597) MiniYARNCluster doesn't propagate reason for AHS not starting

2014-09-24 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146795#comment-14146795
 ] 

Allen Wittenauer commented on YARN-2597:


+1 lgtm :)

> MiniYARNCluster doesn't propagate reason for AHS not starting
> -
>
> Key: YARN-2597
> URL: https://issues.apache.org/jira/browse/YARN-2597
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
> Attachments: YARN-2597-001.patch
>
>
> If the AHS doesn't come up, your test run gets an exception telling you this 
> fact -but the underlying cause is not propagated.
> As YARN services do record their failure cause, extracting and propagating 
> this is trivial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2569) API changes for handling logs of long-running services

2014-09-24 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2569:
--
Summary: API changes for handling logs of long-running services  (was: Log 
Handling for LRS API Changes)

> API changes for handling logs of long-running services
> --
>
> Key: YARN-2569
> URL: https://issues.apache.org/jira/browse/YARN-2569
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.6.0
>
> Attachments: YARN-2569.1.patch, YARN-2569.2.patch, YARN-2569.3.patch, 
> YARN-2569.4.1.patch, YARN-2569.4.patch, YARN-2569.5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2443) Handling logs of long-running services on YARN

2014-09-24 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2443:
--
Summary: Handling logs of long-running services on YARN  (was: Log Handling 
for Long Running Service)

> Handling logs of long-running services on YARN
> --
>
> Key: YARN-2443
> URL: https://issues.apache.org/jira/browse/YARN-2443
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146848#comment-14146848
 ] 

Hadoop QA commented on YARN-913:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12671012/YARN-913-009.patch
  against trunk revision d78b452.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 36 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1266 javac 
compiler warnings (more than the trunk's current 1265 warnings).

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 3 
warning messages.
See 
https://builds.apache.org/job/PreCommit-YARN-Build/5104//artifact/PreCommit-HADOOP-Build-patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests:

  
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
  
org.apache.hadoop.yarn.registry.secure.TestSecureRMRegistryOperations
  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

  The following test timeouts occurred in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests:

org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5104//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5104//artifact/PreCommit-HADOOP-Build-patchprocess/newPatchFindbugsWarningshadoop-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5104//artifact/PreCommit-HADOOP-Build-patchprocess/newPatchFindbugsWarningshadoop-yarn-registry.html
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5104//artifact/PreCommit-HADOOP-Build-patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5104//console

This message is automatically generated.

> Add a way to register long-lived services in a YARN cluster
> ---
>
> Key: YARN-913
> URL: https://issues.apache.org/jira/browse/YARN-913
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Affects Versions: 2.5.0, 2.4.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
> 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
> YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
> YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
> YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, yarnregistry.pdf, 
> yarnregistry.tla
>
>
> In a YARN cluster you can't predict where services will come up -or on what 
> ports. The services need to work those things out as they come up and then 
> publish them somewhere.
> Applications need to be able to find the service instance they are to bond to 
> -and not any others in the cluster.
> Some kind of service registry -in the RM, in ZK, could do this. If the RM 
> held the write access to the ZK nodes, it would be more secure than having 
> apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2594) ResourceManger sometimes become un-responsive

2014-09-24 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146890#comment-14146890
 ] 

zhihai xu commented on YARN-2594:
-

Only these two threads won't  cause deadlock because they only access the 
RMAppImpl.readLock.
There is another thread which access RMAppImpl.writeLock at the following:
{code}
"AsyncDispatcher event handler" prio=10 tid=0x7f0328b2e800 nid=0x7c58 
waiting on condition [0x7f0306d9d000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0xe0e72bc0> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
at 
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:698)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:94)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:716)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:700)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:745)
{code}

I think these three threads cause the deadlock.

> ResourceManger sometimes become un-responsive
> -
>
> Key: YARN-2594
> URL: https://issues.apache.org/jira/browse/YARN-2594
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Karam Singh
>Assignee: Wangda Tan
>
> ResoruceManager sometimes become un-responsive:
> There was in exception in ResourceManager log and contains only  following 
> type of messages:
> {code}
> 2014-09-19 19:13:45,241 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 53000
> 2014-09-19 19:30:26,312 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 54000
> 2014-09-19 19:47:07,351 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 55000
> 2014-09-19 20:03:48,460 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 56000
> 2014-09-19 20:20:29,542 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 57000
> 2014-09-19 20:37:10,635 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 58000
> 2014-09-19 20:53:51,722 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 59000
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2596) TestWorkPreservingRMRestart for FairScheduler failed on trunk

2014-09-24 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2596:
---
Attachment: yarn-2596-1.patch

I poked around a little, but couldn't zero in on which commit caused this. 

Here is a patch that fixes the test though. 

> TestWorkPreservingRMRestart for FairScheduler failed on trunk
> -
>
> Key: YARN-2596
> URL: https://issues.apache.org/jira/browse/YARN-2596
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Junping Du
>Assignee: Karthik Kambatla
> Attachments: yarn-2596-1.patch
>
>
> As test result from YARN-668, the test failure can be reproduce locally 
> without apply new patch to trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2581) NMs need to find a way to get LogAggregationContext

2014-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146894#comment-14146894
 ] 

Hadoop QA commented on YARN-2581:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12671031/YARN-2581.4.patch
  against trunk revision 9fa5a89.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5105//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5105//console

This message is automatically generated.

> NMs need to find a way to get LogAggregationContext
> ---
>
> Key: YARN-2581
> URL: https://issues.apache.org/jira/browse/YARN-2581
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2581.1.patch, YARN-2581.2.patch, YARN-2581.3.patch, 
> YARN-2581.4.patch
>
>
> After YARN-2569, we have LogAggregationContext for application in 
> ApplicationSubmissionContext. NMs need to find a way to get this information.
> We have this requirement: For all containers in the same application should 
> honor the same LogAggregationContext.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-90) NodeManager should identify failed disks becoming good back again

2014-09-24 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-90:
--
Attachment: apache-yarn-90.5.patch

Uploaded a new patch to address [~jlowe]'s comments.

{quote}
It's a bit odd to have a hash map to map disk error types to lists of 
directories, fill them all in, but we only in practice actually look at one 
type in the map and that's DISK_FULL. It'd be simpler (and faster and less 
space since there's no hashmap involved) to just track full disks as a separate 
collection like we already do for localDirs and failedDirs.
{quote}

Fixed. I renamed failedDirs to errorDirs and added a list for fullDirs. The 
getFailedDirs() function returns a union of the two.

{quote}
Nit: DISK_ERROR_CAUSE should be DiskErrorCause (if we keep the enum) to match 
the style of other enum types in the code.
{quote}

Fixed.

{quote}
In verifyDirUsingMkdir, if an error occurs during the finally clause then that 
exception will mask the original exception
{quote}

Fixed.

{quote}
isDiskUsageUnderPercentageLimit is named backwards. Disk usage being under the 
configured limit shouldn't be a full disk error, and the error message is 
inconsistent with the method name (method talks about being under but error 
message says its above).
{noformat}
if (isDiskUsageUnderPercentageLimit(testDir)) {
  msg =
  "used space above threshold of "
  + diskUtilizationPercentageCutoff
  + "%, removing from the list of valid directories.";
{noformat}
{quote}

Yep, thanks for catching it. Fixed.

{quote}
We should only call getDisksHealthReport() once in the following code:
{noformat}
+String report = getDisksHealthReport();
+if (!report.isEmpty()) {
+  LOG.info("Disk(s) failed. " + getDisksHealthReport());
{noformat}
{quote}
Fixed.

{quote}
Should updateDirsAfterTest always say "Disk(s) failed" if the report isn't 
empty? Thinking of the case where two disks go bad, then one later is restored. 
The health report will still have something, but that last update is a disk 
turning good not failing. Before this code was only called when a new disk 
failed, and now that's not always the case. Maybe it should just be something 
like "Disk health update: " instead?
{quote}

I've changed it to "Disk(s) health report: ". My only concern with this is that 
there might be scripts looking for the "Disk(s) failed" log line for 
monitoring. What do you think?

{quote}
Is it really necessary to stat a directory before we try to delete it? Seems 
like we can just try to delete it.
{quote}

Just wanted to avoid an unnecessary attempt. If a disk is comes back as good 
when a container is running, it won't have the container directories leading to 
an unnecessary delete.

{quote}
The idiom of getting the directories and adding the full directories seems 
pretty common. Might be good to have dirhandler methods that already do this, 
like getLocalDirsForCleanup or getLogDirsForCleanup.
{quote}

Fixed.

{quote}
I'm a bit worried that getInitializedLocalDirs could potentially try to delete 
an entire directory tree for a disk. If this fails in some sector-specific way 
but other containers are currently using their files from other sectors just 
fine on the same disk, removing these files from underneath active containers 
could be very problematic and difficult to debug.
{quote}

Fixed. Directories are only cleaned up during startup. The code tests for 
existence of the directories and the correct permissions. This does mean that 
container directories left behind for any reason won't get cleaned up unit the 
NodeManager is restarted. Is that ok?

> NodeManager should identify failed disks becoming good back again
> -
>
> Key: YARN-90
> URL: https://issues.apache.org/jira/browse/YARN-90
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Ravi Gummadi
>Assignee: Varun Vasudev
> Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, 
> YARN-90.patch, YARN-90.patch, apache-yarn-90.0.patch, apache-yarn-90.1.patch, 
> apache-yarn-90.2.patch, apache-yarn-90.3.patch, apache-yarn-90.4.patch, 
> apache-yarn-90.5.patch
>
>
> MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes 
> down, it is marked as failed forever. To reuse that disk (after it becomes 
> good), NodeManager needs restart. This JIRA is to improve NodeManager to 
> reuse good disks(which could be bad some time back).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2546) REST API for application creation/submission is using strings for numeric & boolean values

2014-09-24 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146900#comment-14146900
 ] 

Zhijie Shen commented on YARN-2546:
---

[~vvasudev], LGTM overall. Some minor comments:

1. Make the app Id consistent?
{code}
  Response Body:

+---+
{
  "application-id":"application_1410870995658_0001",
  "maximum-resource-capability":
{
  "memory":8192,
  "vCores":32
}
}
+---+
{code}
{code}
  Response Body:

+---+


  application_1404198295326_0003
  
8192
32
  

+---+
{code}

2. Maybe this change is not necessary? JAXBContextResolver can be added in 
another place.
{code}
-WebResource r = resource();
+ClientConfig cfg = new DefaultClientConfig();
+cfg.getClasses().add(JAXBContextResolver.class);
+Client client = Client.create(cfg);
+client.addFilter(new LoggingFilter(System.out));
+WebResource r = client.resource(resource().getURI());
{code}
This is what I did before in TestTimelineWebServices:
{code}
  public TestTimelineWebServices() {
super(new WebAppDescriptor.Builder(
"org.apache.hadoop.yarn.server.applicationhistoryservice.webapp")
.contextListenerClass(GuiceServletConfig.class)
.filterClass(com.google.inject.servlet.GuiceFilter.class)
.contextPath("jersey-guice-filter")
.servletPath("/")
.clientConfig(
new DefaultClientConfig(YarnJacksonJaxbJsonProvider.class))
.build());
  }
{code}

> REST API for application creation/submission is using strings for numeric & 
> boolean values
> --
>
> Key: YARN-2546
> URL: https://issues.apache.org/jira/browse/YARN-2546
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: api
>Affects Versions: 2.5.1
>Reporter: Doug Haigh
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2546.0.patch
>
>
> When YARN responds with or accepts JSON, numbers & booleans are being 
> represented as strings which can cause parsing problems.
> Resource values look like 
> {
>   "application-id":"application_1404198295326_0001",
>   "maximum-resource-capability":
>{
>   "memory":"8192",
>   "vCores":"32"
>}
> }
> Instead of
> {
>   "application-id":"application_1404198295326_0001",
>   "maximum-resource-capability":
>{
>   "memory":8192,
>   "vCores":32
>}
> }
> When I POST to start a job, numeric values are represented as numbers:
>   "local-resources":
>   {
> "entry":
> [
>   {
> "key":"AppMaster.jar",
> "value":
> {
>   
> "resource":"hdfs://hdfs-namenode:9000/user/testuser/DistributedShell/demo-app/AppMaster.jar",
>   "type":"FILE",
>   "visibility":"APPLICATION",
>   "size": "43004",
>   "timestamp": "1405452071209"
> }
>   }
> ]
>   },
> Instead of
>   "local-resources":
>   {
> "entry":
> [
>   {
> "key":"AppMaster.jar",
> "value":
> {
>   
> "resource":"hdfs://hdfs-namenode:9000/user/testuser/DistributedShell/demo-app/AppMaster.jar",
>   "type":"FILE",
>   "visibility":"APPLICATION",
>   "size": 43004,
>   "timestamp": 1405452071209
> }
>   }
> ]
>   },
> Similarly, Boolean values are also represented as strings:
> "keep-containers-across-application-attempts":"false"
> Instead of 
> "keep-containers-across-application-attempts":false



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2581) NMs need to find a way to get LogAggregationContext

2014-09-24 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146901#comment-14146901
 ] 

Xuan Gong commented on YARN-2581:
-

testcase failure is not related, and is tracked by 
https://issues.apache.org/jira/browse/YARN-2596

> NMs need to find a way to get LogAggregationContext
> ---
>
> Key: YARN-2581
> URL: https://issues.apache.org/jira/browse/YARN-2581
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2581.1.patch, YARN-2581.2.patch, YARN-2581.3.patch, 
> YARN-2581.4.patch
>
>
> After YARN-2569, we have LogAggregationContext for application in 
> ApplicationSubmissionContext. NMs need to find a way to get this information.
> We have this requirement: For all containers in the same application should 
> honor the same LogAggregationContext.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2284) Find missing config options in YarnConfiguration and yarn-default.xml

2014-09-24 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146910#comment-14146910
 ] 

Ray Chiang commented on YARN-2284:
--

RE: findbugs

This one is new to me.  I'd guess it's unrelated, but can someone point me at 
what else I should be looking for?

Inconsistent synchronization of 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.delegationTokenSequenceNumber;
 locked 71% of time
Bug type IS2_INCONSISTENT_SYNC (click for details) 
In class 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager
Field 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.delegationTokenSequenceNumber
Synchronized 71% of the time


> Find missing config options in YarnConfiguration and yarn-default.xml
> -
>
> Key: YARN-2284
> URL: https://issues.apache.org/jira/browse/YARN-2284
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.4.1
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Minor
>  Labels: supportability
> Attachments: YARN-2284-04.patch, YARN-2284-05.patch, 
> YARN-2284-06.patch, YARN-2284-07.patch, YARN-2284-08.patch, 
> YARN2284-01.patch, YARN2284-02.patch, YARN2284-03.patch
>
>
> YarnConfiguration has one set of properties.  yarn-default.xml has another 
> set of properties.  Ideally, there should be an automatic way to find missing 
> properties in either location.
> This is analogous to MAPREDUCE-5130, but for yarn-default.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2284) Find missing config options in YarnConfiguration and yarn-default.xml

2014-09-24 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146919#comment-14146919
 ] 

Karthik Kambatla commented on YARN-2284:


HADOOP-11122 is to fix this. 

> Find missing config options in YarnConfiguration and yarn-default.xml
> -
>
> Key: YARN-2284
> URL: https://issues.apache.org/jira/browse/YARN-2284
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.4.1
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Minor
>  Labels: supportability
> Attachments: YARN-2284-04.patch, YARN-2284-05.patch, 
> YARN-2284-06.patch, YARN-2284-07.patch, YARN-2284-08.patch, 
> YARN2284-01.patch, YARN2284-02.patch, YARN2284-03.patch
>
>
> YarnConfiguration has one set of properties.  yarn-default.xml has another 
> set of properties.  Ideally, there should be an automatic way to find missing 
> properties in either location.
> This is analogous to MAPREDUCE-5130, but for yarn-default.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2596) TestWorkPreservingRMRestart for FairScheduler failed on trunk

2014-09-24 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146921#comment-14146921
 ] 

Sandy Ryza commented on YARN-2596:
--

+1 pending jenkins

> TestWorkPreservingRMRestart for FairScheduler failed on trunk
> -
>
> Key: YARN-2596
> URL: https://issues.apache.org/jira/browse/YARN-2596
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Junping Du
>Assignee: Karthik Kambatla
> Attachments: yarn-2596-1.patch
>
>
> As test result from YARN-668, the test failure can be reproduce locally 
> without apply new patch to trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-668) TokenIdentifier serialization should consider Unknown fields

2014-09-24 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146925#comment-14146925
 ] 

Jian He commented on YARN-668:
--

Thanks Junping ! some comments on the patch:
- The newly added version id may not be needed. For compatibility issue, we can 
either explicitly check the version id mismatch and throw version mismatch 
exception, or we can just check the specific required field and throw logic 
exception.
- indentation of the annotation
{code}
@Private
  public ApplicationAttemptId getApplicationAttemptId() {
{code}
- {{AMRMTokenSecretManager#newInstance()}} if this method is only used by test, 
we can move it to test?
- previous comment from Vinod: “The proto definitions need to be in 
server-common. ” 
- To convert stream into byte array. {{byte[] buffer = 
IOUtils.toByteArray(dis);}} 

> TokenIdentifier serialization should consider Unknown fields
> 
>
> Key: YARN-668
> URL: https://issues.apache.org/jira/browse/YARN-668
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Junping Du
>Priority: Blocker
> Attachments: YARN-668-demo.patch, YARN-668-v2.patch, 
> YARN-668-v3.patch, YARN-668-v4.patch, YARN-668.patch
>
>
> This would allow changing of the TokenIdentifier between versions. The 
> current serialization is Writable. A simple way to achieve this would be to 
> have a Proto object as the payload for TokenIdentifiers, instead of 
> individual fields.
> TokenIdentifier continues to implement Writable to work with the RPC layer - 
> but the payload itself is serialized using PB.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2284) Find missing config options in YarnConfiguration and yarn-default.xml

2014-09-24 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146927#comment-14146927
 ] 

Ray Chiang commented on YARN-2284:
--

Great.  Thanks!

> Find missing config options in YarnConfiguration and yarn-default.xml
> -
>
> Key: YARN-2284
> URL: https://issues.apache.org/jira/browse/YARN-2284
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.4.1
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Minor
>  Labels: supportability
> Attachments: YARN-2284-04.patch, YARN-2284-05.patch, 
> YARN-2284-06.patch, YARN-2284-07.patch, YARN-2284-08.patch, 
> YARN2284-01.patch, YARN2284-02.patch, YARN2284-03.patch
>
>
> YarnConfiguration has one set of properties.  yarn-default.xml has another 
> set of properties.  Ideally, there should be an automatic way to find missing 
> properties in either location.
> This is analogous to MAPREDUCE-5130, but for yarn-default.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2523) ResourceManager UI showing negative value for "Decommissioned Nodes" field

2014-09-24 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146931#comment-14146931
 ] 

Jason Lowe commented on YARN-2523:
--

Thanks for updating the patch.  I think it looks good overall.

[~jianhe] could you take a look?  This patch undoes a chunk of YARN-1071, and I 
want to make sure we don't accidentally regress something there.

> ResourceManager UI showing negative value for "Decommissioned Nodes" field
> --
>
> Key: YARN-2523
> URL: https://issues.apache.org/jira/browse/YARN-2523
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Affects Versions: 3.0.0
>Reporter: Nishan Shetty
>Assignee: Rohith
> Attachments: YARN-2523.1.patch, YARN-2523.patch, YARN-2523.patch
>
>
> 1. Decommission one NodeManager by configuring ip in excludehost file
> 2. Remove ip from excludehost file
> 3. Execute -refreshNodes command and restart Decommissioned NodeManager
> Observe that in RM UI negative value for "Decommissioned Nodes" field is shown



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good back again

2014-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146937#comment-14146937
 ] 

Hadoop QA commented on YARN-90:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12671047/apache-yarn-90.5.patch
  against trunk revision 9fa5a89.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1270 javac 
compiler warnings (more than the trunk's current 1265 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.TestNonAggregatingLogHandler
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5107//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5107//artifact/PreCommit-HADOOP-Build-patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5107//console

This message is automatically generated.

> NodeManager should identify failed disks becoming good back again
> -
>
> Key: YARN-90
> URL: https://issues.apache.org/jira/browse/YARN-90
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Ravi Gummadi
>Assignee: Varun Vasudev
> Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, 
> YARN-90.patch, YARN-90.patch, apache-yarn-90.0.patch, apache-yarn-90.1.patch, 
> apache-yarn-90.2.patch, apache-yarn-90.3.patch, apache-yarn-90.4.patch, 
> apache-yarn-90.5.patch
>
>
> MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes 
> down, it is marked as failed forever. To reuse that disk (after it becomes 
> good), NodeManager needs restart. This JIRA is to improve NodeManager to 
> reuse good disks(which could be bad some time back).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2598) GHS should show N/A instead of null for the inaccessible information

2014-09-24 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2598:
-

 Summary: GHS should show N/A instead of null for the inaccessible 
information
 Key: YARN-2598
 URL: https://issues.apache.org/jira/browse/YARN-2598
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Zhijie Shen


When the user doesn't have the access to an application, the app attempt 
information is not visible to the user. ClientRMService will output N/A, but 
GHS is showing null, which is not user-friendly.

{code}
14/09/24 22:07:20 INFO impl.TimelineClientImpl: Timeline service address: 
http://nn.example.com:8188/ws/v1/timeline/
14/09/24 22:07:20 INFO client.RMProxy: Connecting to ResourceManager at 
nn.example.com/240.0.0.11:8050
14/09/24 22:07:21 INFO client.AHSProxy: Connecting to Application History 
server at nn.example.com/240.0.0.11:10200
Application Report : 
Application-Id : application_1411586934799_0001
Application-Name : Sleep job
Application-Type : MAPREDUCE
User : hrt_qa
Queue : default
Start-Time : 1411586956012
Finish-Time : 1411586989169
Progress : 100%
State : FINISHED
Final-State : SUCCEEDED
Tracking-URL : null
RPC Port : -1
AM Host : null
Aggregate Resource Allocation : N/A
Diagnostics : null
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2598) GHS should show N/A instead of null for the inaccessible information

2014-09-24 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2598:
--
Issue Type: Sub-task  (was: Bug)
Parent: YARN-321

> GHS should show N/A instead of null for the inaccessible information
> 
>
> Key: YARN-2598
> URL: https://issues.apache.org/jira/browse/YARN-2598
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Zhijie Shen
>
> When the user doesn't have the access to an application, the app attempt 
> information is not visible to the user. ClientRMService will output N/A, but 
> GHS is showing null, which is not user-friendly.
> {code}
> 14/09/24 22:07:20 INFO impl.TimelineClientImpl: Timeline service address: 
> http://nn.example.com:8188/ws/v1/timeline/
> 14/09/24 22:07:20 INFO client.RMProxy: Connecting to ResourceManager at 
> nn.example.com/240.0.0.11:8050
> 14/09/24 22:07:21 INFO client.AHSProxy: Connecting to Application History 
> server at nn.example.com/240.0.0.11:10200
> Application Report : 
>   Application-Id : application_1411586934799_0001
>   Application-Name : Sleep job
>   Application-Type : MAPREDUCE
>   User : hrt_qa
>   Queue : default
>   Start-Time : 1411586956012
>   Finish-Time : 1411586989169
>   Progress : 100%
>   State : FINISHED
>   Final-State : SUCCEEDED
>   Tracking-URL : null
>   RPC Port : -1
>   AM Host : null
>   Aggregate Resource Allocation : N/A
>   Diagnostics : null
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2596) TestWorkPreservingRMRestart for FairScheduler failed on trunk

2014-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146963#comment-14146963
 ] 

Hadoop QA commented on YARN-2596:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12671044/yarn-2596-1.patch
  against trunk revision 9fa5a89.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5106//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5106//console

This message is automatically generated.

> TestWorkPreservingRMRestart for FairScheduler failed on trunk
> -
>
> Key: YARN-2596
> URL: https://issues.apache.org/jira/browse/YARN-2596
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Junping Du
>Assignee: Karthik Kambatla
> Attachments: yarn-2596-1.patch
>
>
> As test result from YARN-668, the test failure can be reproduce locally 
> without apply new patch to trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2056) Disable preemption at Queue level

2014-09-24 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-2056:
-
Attachment: YARN-2056.201409242210.txt

[~leftnoteasy],

I'm sorry for the churn on patches, and thanks again for helping me on this.

The current patch maintains the existing behavior as before, and addresses the 
concern you raised in comment:
https://issues.apache.org/jira/browse/YARN-2056?focusedCommentId=14142404&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14142404

That is, it addresses the case where
{code}
root has A,B,C, total capacity = 90
A.guaranteed = 30, A.pending = 20, A.current = 40
B.guaranteed = 30, B.pending = 0, B.current = 50
C.guaranteed = 30, C.pending = 0, C.current = 0
{code}

It will levelize the over-capacity queues to be A.idealAssigned = 45, 
B.idealAssigned = 45

This patch implements the algorithm I described in comment:
https://issues.apache.org/jira/browse/YARN-2056?focusedCommentId=14145650&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14145650

> Disable preemption at Queue level
> -
>
> Key: YARN-2056
> URL: https://issues.apache.org/jira/browse/YARN-2056
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Mayank Bansal
>Assignee: Eric Payne
> Attachments: YARN-2056.201408202039.txt, YARN-2056.201408260128.txt, 
> YARN-2056.201408310117.txt, YARN-2056.201409022208.txt, 
> YARN-2056.201409181916.txt, YARN-2056.201409210049.txt, 
> YARN-2056.201409232329.txt, YARN-2056.201409242210.txt
>
>
> We need to be able to disable preemption at individual queue level



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2598) GHS should show N/A instead of null for the inaccessible information

2014-09-24 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen reassigned YARN-2598:
-

Assignee: Zhijie Shen

> GHS should show N/A instead of null for the inaccessible information
> 
>
> Key: YARN-2598
> URL: https://issues.apache.org/jira/browse/YARN-2598
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> When the user doesn't have the access to an application, the app attempt 
> information is not visible to the user. ClientRMService will output N/A, but 
> GHS is showing null, which is not user-friendly.
> {code}
> 14/09/24 22:07:20 INFO impl.TimelineClientImpl: Timeline service address: 
> http://nn.example.com:8188/ws/v1/timeline/
> 14/09/24 22:07:20 INFO client.RMProxy: Connecting to ResourceManager at 
> nn.example.com/240.0.0.11:8050
> 14/09/24 22:07:21 INFO client.AHSProxy: Connecting to Application History 
> server at nn.example.com/240.0.0.11:10200
> Application Report : 
>   Application-Id : application_1411586934799_0001
>   Application-Name : Sleep job
>   Application-Type : MAPREDUCE
>   User : hrt_qa
>   Queue : default
>   Start-Time : 1411586956012
>   Finish-Time : 1411586989169
>   Progress : 100%
>   State : FINISHED
>   Final-State : SUCCEEDED
>   Tracking-URL : null
>   RPC Port : -1
>   AM Host : null
>   Aggregate Resource Allocation : N/A
>   Diagnostics : null
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2596) TestWorkPreservingRMRestart for FairScheduler failed on trunk

2014-09-24 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146971#comment-14146971
 ] 

Karthik Kambatla commented on YARN-2596:


Thanks Sandy. I ll go ahead and commit this. 

> TestWorkPreservingRMRestart for FairScheduler failed on trunk
> -
>
> Key: YARN-2596
> URL: https://issues.apache.org/jira/browse/YARN-2596
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Junping Du
>Assignee: Karthik Kambatla
> Attachments: yarn-2596-1.patch
>
>
> As test result from YARN-668, the test failure can be reproduce locally 
> without apply new patch to trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2596) TestWorkPreservingRMRestart fails with FairScheduler

2014-09-24 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2596:
---
Summary: TestWorkPreservingRMRestart fails with FairScheduler  (was: 
TestWorkPreservingRMRestart for FairScheduler failed on trunk)

> TestWorkPreservingRMRestart fails with FairScheduler
> 
>
> Key: YARN-2596
> URL: https://issues.apache.org/jira/browse/YARN-2596
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Junping Du
>Assignee: Karthik Kambatla
> Attachments: yarn-2596-1.patch
>
>
> As test result from YARN-668, the test failure can be reproduce locally 
> without apply new patch to trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-09-24 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147034#comment-14147034
 ] 

Craig Welch commented on YARN-796:
--

Some additional info regarding the headroom problem - one of the prototypical 
node label cases is a queue which can access the whole cluster but which also 
can access a particular label ("a").  A mapreduce job is launched on this queue 
with an expression limiting it to "a" nodes.  It will receive headroom 
reflecting access to the whole cluster, even though it can only use "a" nodes.  
This will sometimes result in a deadlock situation where it starts reducers 
before it should, based on the incorrect (inflated) headroom, and then cannot 
start mappers in order to complete the map phase, and so is deadlocked.  If 
there are significantly fewer "a" nodes than the total cluster (expected to be 
a frequent case), during cases of high or full utilization of those nodes 
(again, desirable and probably typical), this deadlock will occur. 

It is possible to make no change and receive the correct headroom value for a 
very restricted set of configurations.  If queues are restricted to a single 
label (and not * or "also the whole cluster"), and jobs run with a label 
expression selecting that single label, they should get the correct headroom 
values.  Unfortunately, this eliminates a great many use cases/cluster 
configurations, including the one above, which I think it is very importantant 
to support.

A couple of additional details regarding Solution 1 above - in addition to the 
potential to expand the allocate response api to include a map of 
expresion->headroom values, it is also possible with this approach to return 
the correct headroom value where it is currently returned for a job with a 
single expression.  So, a scenario I think very likely - which is the first use 
case above (a queue which can see the whole cluster + a label with "special" 
nodes, say label "GPU"), with a default label expression of "GPU" (used by the 
job throughout), running an unmodified mapreduce job (or hive, etc), where no 
special support for labels has been added to the that component in the 
platform, the correct headroom will be returned.   I think it's important to be 
able to introduce node label usability in a largely backward compatible way to 
enable mapreduce & things above to be able to make use of node labels with just 
configuration/the yarn platform implementation, and this is the solution (of 
the one's we've considered) which will make this possible. 

> Allow for (admin) labels on nodes and resource-requests
> ---
>
> Key: YARN-796
> URL: https://issues.apache.org/jira/browse/YARN-796
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.1
>Reporter: Arun C Murthy
>Assignee: Wangda Tan
> Attachments: LabelBasedScheduling.pdf, 
> Node-labels-Requirements-Design-doc-V1.pdf, 
> Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, 
> YARN-796.node-label.consolidate.1.patch, 
> YARN-796.node-label.consolidate.2.patch, 
> YARN-796.node-label.consolidate.3.patch, 
> YARN-796.node-label.consolidate.4.patch, 
> YARN-796.node-label.consolidate.5.patch, 
> YARN-796.node-label.consolidate.6.patch, 
> YARN-796.node-label.consolidate.7.patch, 
> YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
> YARN-796.patch, YARN-796.patch4
>
>
> It will be useful for admins to specify labels for nodes. Examples of labels 
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on 
> resource-requests.
> Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2190) Provide a Windows container executor that can limit memory and CPU

2014-09-24 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147040#comment-14147040
 ] 

Varun Vasudev commented on YARN-2190:
-

Sorry, I cancelled the patch by mistake. Resubmitting.

> Provide a Windows container executor that can limit memory and CPU
> --
>
> Key: YARN-2190
> URL: https://issues.apache.org/jira/browse/YARN-2190
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Chuan Liu
>Assignee: Chuan Liu
> Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, 
> YARN-2190.2.patch, YARN-2190.3.patch, YARN-2190.4.patch, YARN-2190.5.patch
>
>
> Yarn default container executor on Windows does not set the resource limit on 
> the containers currently. The memory limit is enforced by a separate 
> monitoring thread. The container implementation on Windows uses Job Object 
> right now. The latest Windows (8 or later) API allows CPU and memory limits 
> on the job objects. We want to create a Windows container executor that sets 
> the limits on job objects thus provides resource enforcement at OS level.
> http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-90) NodeManager should identify failed disks becoming good back again

2014-09-24 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-90:
--
Attachment: apache-yarn-90.6.patch

Uploaded patch with fixed warnings and test cases.

> NodeManager should identify failed disks becoming good back again
> -
>
> Key: YARN-90
> URL: https://issues.apache.org/jira/browse/YARN-90
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Ravi Gummadi
>Assignee: Varun Vasudev
> Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, 
> YARN-90.patch, YARN-90.patch, apache-yarn-90.0.patch, apache-yarn-90.1.patch, 
> apache-yarn-90.2.patch, apache-yarn-90.3.patch, apache-yarn-90.4.patch, 
> apache-yarn-90.5.patch, apache-yarn-90.6.patch
>
>
> MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes 
> down, it is marked as failed forever. To reuse that disk (after it becomes 
> good), NodeManager needs restart. This JIRA is to improve NodeManager to 
> reuse good disks(which could be bad some time back).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2599) Standby RM should also expose some jmx and metrics

2014-09-24 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-2599:
--

 Summary: Standby RM should also expose some jmx and metrics
 Key: YARN-2599
 URL: https://issues.apache.org/jira/browse/YARN-2599
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Karthik Kambatla


YARN-1898 redirects jmx and metrics to the Active. As discussed there, we need 
to separate out metrics displayed so the Standby RM can also be monitored. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-668) TokenIdentifier serialization should consider Unknown fields

2014-09-24 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147052#comment-14147052
 ] 

Junping Du commented on YARN-668:
-

bq. The newly added version id may not be needed. For compatibility issue, we 
can either explicitly check the version id mismatch and throw version mismatch 
exception, or we can just check the specific required field and throw logic 
exception.
Agree. Sync up with Vinod offline and he is OK to remove the version id. Will 
remove it soon.

bq.  indentation of the annotation
Nice catch, will fix it soon.

bq. AMRMTokenSecretManager#newInstance() if this method is only used by test, 
we can move it to test?
Make sense. Remove this method as it only use once.

bq. previous comment from Vinod: “The proto definitions need to be in 
server-common. ”
Things will become more complexity if moving proto to server-common for this 
patch. We need to move the proto obj inside TokenIdentifier now to server side 
also (or make hadoop-yarn-common to depend on hadoop-yarn-server which is not 
correct), which means we need to get rid of all getter methods now that affect 
too many places in code base. I would suggest to have a separated refactor 
patch to move proto to server side. Thoughts?

bq. To convert stream into byte array. byte[] buffer = IOUtils.toByteArray(dis);
Make sense. Will replace it.

> TokenIdentifier serialization should consider Unknown fields
> 
>
> Key: YARN-668
> URL: https://issues.apache.org/jira/browse/YARN-668
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Junping Du
>Priority: Blocker
> Attachments: YARN-668-demo.patch, YARN-668-v2.patch, 
> YARN-668-v3.patch, YARN-668-v4.patch, YARN-668.patch
>
>
> This would allow changing of the TokenIdentifier between versions. The 
> current serialization is Writable. A simple way to achieve this would be to 
> have a Proto object as the payload for TokenIdentifiers, instead of 
> individual fields.
> TokenIdentifier continues to implement Writable to work with the RPC layer - 
> but the payload itself is serialized using PB.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2594) ResourceManger sometimes become un-responsive

2014-09-24 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2594:
---
Priority: Blocker  (was: Major)

> ResourceManger sometimes become un-responsive
> -
>
> Key: YARN-2594
> URL: https://issues.apache.org/jira/browse/YARN-2594
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Karam Singh
>Assignee: Wangda Tan
>Priority: Blocker
>
> ResoruceManager sometimes become un-responsive:
> There was in exception in ResourceManager log and contains only  following 
> type of messages:
> {code}
> 2014-09-19 19:13:45,241 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 53000
> 2014-09-19 19:30:26,312 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 54000
> 2014-09-19 19:47:07,351 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 55000
> 2014-09-19 20:03:48,460 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 56000
> 2014-09-19 20:20:29,542 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 57000
> 2014-09-19 20:37:10,635 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 58000
> 2014-09-19 20:53:51,722 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 59000
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2598) GHS should show N/A instead of null for the inaccessible information

2014-09-24 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2598:
--
Attachment: YARN-2598.1.patch

Make a patch to fix the issue.

> GHS should show N/A instead of null for the inaccessible information
> 
>
> Key: YARN-2598
> URL: https://issues.apache.org/jira/browse/YARN-2598
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2598.1.patch
>
>
> When the user doesn't have the access to an application, the app attempt 
> information is not visible to the user. ClientRMService will output N/A, but 
> GHS is showing null, which is not user-friendly.
> {code}
> 14/09/24 22:07:20 INFO impl.TimelineClientImpl: Timeline service address: 
> http://nn.example.com:8188/ws/v1/timeline/
> 14/09/24 22:07:20 INFO client.RMProxy: Connecting to ResourceManager at 
> nn.example.com/240.0.0.11:8050
> 14/09/24 22:07:21 INFO client.AHSProxy: Connecting to Application History 
> server at nn.example.com/240.0.0.11:10200
> Application Report : 
>   Application-Id : application_1411586934799_0001
>   Application-Name : Sleep job
>   Application-Type : MAPREDUCE
>   User : hrt_qa
>   Queue : default
>   Start-Time : 1411586956012
>   Finish-Time : 1411586989169
>   Progress : 100%
>   State : FINISHED
>   Final-State : SUCCEEDED
>   Tracking-URL : null
>   RPC Port : -1
>   AM Host : null
>   Aggregate Resource Allocation : N/A
>   Diagnostics : null
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2598) GHS should show N/A instead of null for the inaccessible information

2014-09-24 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2598:
--
Target Version/s: 2.6.0

> GHS should show N/A instead of null for the inaccessible information
> 
>
> Key: YARN-2598
> URL: https://issues.apache.org/jira/browse/YARN-2598
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2598.1.patch
>
>
> When the user doesn't have the access to an application, the app attempt 
> information is not visible to the user. ClientRMService will output N/A, but 
> GHS is showing null, which is not user-friendly.
> {code}
> 14/09/24 22:07:20 INFO impl.TimelineClientImpl: Timeline service address: 
> http://nn.example.com:8188/ws/v1/timeline/
> 14/09/24 22:07:20 INFO client.RMProxy: Connecting to ResourceManager at 
> nn.example.com/240.0.0.11:8050
> 14/09/24 22:07:21 INFO client.AHSProxy: Connecting to Application History 
> server at nn.example.com/240.0.0.11:10200
> Application Report : 
>   Application-Id : application_1411586934799_0001
>   Application-Name : Sleep job
>   Application-Type : MAPREDUCE
>   User : hrt_qa
>   Queue : default
>   Start-Time : 1411586956012
>   Finish-Time : 1411586989169
>   Progress : 100%
>   State : FINISHED
>   Final-State : SUCCEEDED
>   Tracking-URL : null
>   RPC Port : -1
>   AM Host : null
>   Aggregate Resource Allocation : N/A
>   Diagnostics : null
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2056) Disable preemption at Queue level

2014-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147075#comment-14147075
 ] 

Hadoop QA commented on YARN-2056:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12671072/YARN-2056.201409242210.txt
  against trunk revision 9fa5a89.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5108//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5108//console

This message is automatically generated.

> Disable preemption at Queue level
> -
>
> Key: YARN-2056
> URL: https://issues.apache.org/jira/browse/YARN-2056
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Mayank Bansal
>Assignee: Eric Payne
> Attachments: YARN-2056.201408202039.txt, YARN-2056.201408260128.txt, 
> YARN-2056.201408310117.txt, YARN-2056.201409022208.txt, 
> YARN-2056.201409181916.txt, YARN-2056.201409210049.txt, 
> YARN-2056.201409232329.txt, YARN-2056.201409242210.txt
>
>
> We need to be able to disable preemption at individual queue level



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2546) REST API for application creation/submission is using strings for numeric & boolean values

2014-09-24 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2546:

Attachment: apache-yarn-2546.1.patch

Uploaded new patch to address [~zjshen]'s comments.

> REST API for application creation/submission is using strings for numeric & 
> boolean values
> --
>
> Key: YARN-2546
> URL: https://issues.apache.org/jira/browse/YARN-2546
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: api
>Affects Versions: 2.5.1
>Reporter: Doug Haigh
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2546.0.patch, apache-yarn-2546.1.patch
>
>
> When YARN responds with or accepts JSON, numbers & booleans are being 
> represented as strings which can cause parsing problems.
> Resource values look like 
> {
>   "application-id":"application_1404198295326_0001",
>   "maximum-resource-capability":
>{
>   "memory":"8192",
>   "vCores":"32"
>}
> }
> Instead of
> {
>   "application-id":"application_1404198295326_0001",
>   "maximum-resource-capability":
>{
>   "memory":8192,
>   "vCores":32
>}
> }
> When I POST to start a job, numeric values are represented as numbers:
>   "local-resources":
>   {
> "entry":
> [
>   {
> "key":"AppMaster.jar",
> "value":
> {
>   
> "resource":"hdfs://hdfs-namenode:9000/user/testuser/DistributedShell/demo-app/AppMaster.jar",
>   "type":"FILE",
>   "visibility":"APPLICATION",
>   "size": "43004",
>   "timestamp": "1405452071209"
> }
>   }
> ]
>   },
> Instead of
>   "local-resources":
>   {
> "entry":
> [
>   {
> "key":"AppMaster.jar",
> "value":
> {
>   
> "resource":"hdfs://hdfs-namenode:9000/user/testuser/DistributedShell/demo-app/AppMaster.jar",
>   "type":"FILE",
>   "visibility":"APPLICATION",
>   "size": 43004,
>   "timestamp": 1405452071209
> }
>   }
> ]
>   },
> Similarly, Boolean values are also represented as strings:
> "keep-containers-across-application-attempts":"false"
> Instead of 
> "keep-containers-across-application-attempts":false



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2594) ResourceManger sometimes become un-responsive

2014-09-24 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147083#comment-14147083
 ] 

Wangda Tan commented on YARN-2594:
--

Hi [~zxu],
You're correct, this problem is, first two readlock thread deadlock because of 
synchronized access. So they block writelock acquiring so RM dispatcher blocked.
Working on a patch now.

Wangda

> ResourceManger sometimes become un-responsive
> -
>
> Key: YARN-2594
> URL: https://issues.apache.org/jira/browse/YARN-2594
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Karam Singh
>Assignee: Wangda Tan
>Priority: Blocker
>
> ResoruceManager sometimes become un-responsive:
> There was in exception in ResourceManager log and contains only  following 
> type of messages:
> {code}
> 2014-09-19 19:13:45,241 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 53000
> 2014-09-19 19:30:26,312 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 54000
> 2014-09-19 19:47:07,351 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 55000
> 2014-09-19 20:03:48,460 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 56000
> 2014-09-19 20:20:29,542 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 57000
> 2014-09-19 20:37:10,635 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 58000
> 2014-09-19 20:53:51,722 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(232)) - Size of event-queue is 59000
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2190) Provide a Windows container executor that can limit memory and CPU

2014-09-24 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147086#comment-14147086
 ] 

Varun Vasudev commented on YARN-2190:
-

[~chuanliu] thank you answering my questions. With respect to vcores - please 
look at [this 
comment|https://issues.apache.org/jira/browse/YARN-2440?focusedCommentId=14107057&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14107057].
 There are other comments in various JIRAs that reflect the same sentiment. 
Assuming a 1-1 mapping between vcores and physical cores is not recommended.

> Provide a Windows container executor that can limit memory and CPU
> --
>
> Key: YARN-2190
> URL: https://issues.apache.org/jira/browse/YARN-2190
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Chuan Liu
>Assignee: Chuan Liu
> Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, 
> YARN-2190.2.patch, YARN-2190.3.patch, YARN-2190.4.patch, YARN-2190.5.patch
>
>
> Yarn default container executor on Windows does not set the resource limit on 
> the containers currently. The memory limit is enforced by a separate 
> monitoring thread. The container implementation on Windows uses Job Object 
> right now. The latest Windows (8 or later) API allows CPU and memory limits 
> on the job objects. We want to create a Windows container executor that sets 
> the limits on job objects thus provides resource enforcement at OS level.
> http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2576) Prepare yarn-1051 branch for merging with trunk

2014-09-24 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-2576:
-
Attachment: YARN-2576.patch

I am attaching a patch that fixes test patch issues like compile errors, 
javadoc warnings, audit issues, etc after rebasing with trunk. The patch is 
slighlty large as many new files had ASL headers missing & I have added them.

> Prepare yarn-1051 branch for merging with trunk
> ---
>
> Key: YARN-2576
> URL: https://issues.apache.org/jira/browse/YARN-2576
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager, scheduler
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Attachments: YARN-2576.patch
>
>
> This JIRA is to track the changes required to ensure branch yarn-1051 is 
> ready to be merged with trunk. This includes fixing any compilation issues, 
> findbug and/or javadoc warning, test cases failures, etc if any.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good back again

2014-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147090#comment-14147090
 ] 

Hadoop QA commented on YARN-90:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12671081/apache-yarn-90.6.patch
  against trunk revision 3cde37c.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5109//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5109//console

This message is automatically generated.

> NodeManager should identify failed disks becoming good back again
> -
>
> Key: YARN-90
> URL: https://issues.apache.org/jira/browse/YARN-90
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Ravi Gummadi
>Assignee: Varun Vasudev
> Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, 
> YARN-90.patch, YARN-90.patch, apache-yarn-90.0.patch, apache-yarn-90.1.patch, 
> apache-yarn-90.2.patch, apache-yarn-90.3.patch, apache-yarn-90.4.patch, 
> apache-yarn-90.5.patch, apache-yarn-90.6.patch
>
>
> MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes 
> down, it is marked as failed forever. To reuse that disk (after it becomes 
> good), NodeManager needs restart. This JIRA is to improve NodeManager to 
> reuse good disks(which could be bad some time back).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-668) TokenIdentifier serialization should consider Unknown fields

2014-09-24 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-668:

Attachment: YARN-668-v5.patch

Thanks [~jianhe] for review and comments! Address your comments in v5 patch.

> TokenIdentifier serialization should consider Unknown fields
> 
>
> Key: YARN-668
> URL: https://issues.apache.org/jira/browse/YARN-668
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Junping Du
>Priority: Blocker
> Attachments: YARN-668-demo.patch, YARN-668-v2.patch, 
> YARN-668-v3.patch, YARN-668-v4.patch, YARN-668-v5.patch, YARN-668.patch
>
>
> This would allow changing of the TokenIdentifier between versions. The 
> current serialization is Writable. A simple way to achieve this would be to 
> have a Proto object as the payload for TokenIdentifiers, instead of 
> individual fields.
> TokenIdentifier continues to implement Writable to work with the RPC layer - 
> but the payload itself is serialized using PB.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2576) Prepare yarn-1051 branch for merging with trunk

2014-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147103#comment-14147103
 ] 

Hadoop QA commented on YARN-2576:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12671092/YARN-2576.patch
  against trunk revision 3cde37c.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5113//console

This message is automatically generated.

> Prepare yarn-1051 branch for merging with trunk
> ---
>
> Key: YARN-2576
> URL: https://issues.apache.org/jira/browse/YARN-2576
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager, scheduler
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Attachments: YARN-2576.patch
>
>
> This JIRA is to track the changes required to ensure branch yarn-1051 is 
> ready to be merged with trunk. This includes fixing any compilation issues, 
> findbug and/or javadoc warning, test cases failures, etc if any.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2576) Prepare yarn-1051 branch for merging with trunk

2014-09-24 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147105#comment-14147105
 ] 

Carlo Curino commented on YARN-2576:


Subru's comment refers to YARN-1051 branch... 


> Prepare yarn-1051 branch for merging with trunk
> ---
>
> Key: YARN-2576
> URL: https://issues.apache.org/jira/browse/YARN-2576
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager, scheduler
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Attachments: YARN-2576.patch
>
>
> This JIRA is to track the changes required to ensure branch yarn-1051 is 
> ready to be merged with trunk. This includes fixing any compilation issues, 
> findbug and/or javadoc warning, test cases failures, etc if any.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >