[jira] [Commented] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation

2015-05-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553708#comment-14553708
 ] 

Hadoop QA commented on YARN-3655:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 58s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 52s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 47s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 48s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 16s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  50m 22s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  87m 39s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12734328/YARN-3655.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / fb6b38d |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8037/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8037/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8037/console |


This message was automatically generated.

> FairScheduler: potential livelock due to maxAMShare limitation and container 
> reservation 
> -
>
> Key: YARN-3655
> URL: https://issues.apache.org/jira/browse/YARN-3655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3655.000.patch, YARN-3655.001.patch, 
> YARN-3655.002.patch
>
>
> FairScheduler: potential livelock due to maxAMShare limitation and container 
> reservation.
> If a node is reserved by an application, all the other applications don't 
> have any chance to assign a new container on this node, unless the 
> application which reserves the node assigns a new container on this node or 
> releases the reserved container on this node.
> The problem is if an application tries to call assignReservedContainer and 
> fail to get a new container due to maxAMShare limitation, it will block all 
> other applications to use the nodes it reserves. If all other running 
> applications can't release their AM containers due to being blocked by these 
> reserved containers. A livelock situation can happen.
> The following is the code at FSAppAttempt#assignContainer which can cause 
> this potential livelock.
> {code}
> // Check the AM resource usage for the leaf queue
> if (!isAmRunning() && !getUnmanagedAM()) {
>   List ask = appSchedulingInfo.getAllResourceRequests();
>   if (ask.isEmpty() || !getQueue().canRunAppAM(
>   ask.get(0).getCapability())) {
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Skipping allocation because maxAMShare limit would " +
>   "be exceeded");
> }
> return Resources.none();
>   }
> }
> {code}
> To fix this issue, we can unreserve the node if we can't allocate the AM 
> container on the node due to Max AM share limitation and the node is reserved 
> by the application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1012) NM should report resource utilization of running containers to RM in heartbeat

2015-05-20 Thread Inigo Goiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated YARN-1012:
--
Attachment: YARN-1012-4.patch

Added missing files.
Fixed some of the comments.

> NM should report resource utilization of running containers to RM in heartbeat
> --
>
> Key: YARN-1012
> URL: https://issues.apache.org/jira/browse/YARN-1012
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Arun C Murthy
>Assignee: Inigo Goiri
> Attachments: YARN-1012-1.patch, YARN-1012-2.patch, YARN-1012-3.patch, 
> YARN-1012-4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1012) NM should report resource utilization of running containers to RM in heartbeat

2015-05-20 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553688#comment-14553688
 ] 

Inigo Goiri commented on YARN-1012:
---

I don't know how I missed the missing files... I've been checking this for 
days. Fixed now.

Agreed and fixed 1, 2, 3, and 4.

I don't know what to do with 5... your call.

> NM should report resource utilization of running containers to RM in heartbeat
> --
>
> Key: YARN-1012
> URL: https://issues.apache.org/jira/browse/YARN-1012
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Arun C Murthy
>Assignee: Inigo Goiri
> Attachments: YARN-1012-1.patch, YARN-1012-2.patch, YARN-1012-3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1012) NM should report resource utilization of running containers to RM in heartbeat

2015-05-20 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553646#comment-14553646
 ] 

Karthik Kambatla commented on YARN-1012:


Looks like the patch is missing ResourceUtilizationPBImpl, and hence doesn't 
build. Could you please include those new files as well? 

Comments on the patch itself:
# Given this is all new code, let us hold off on exposing it to end users just 
yet. Can we mark ContainerStatus#getUtilization Public-Unstable?
# Is there a reason folks would want to turn off tracking utilization? If not, 
let us get rid of the config and always track it? 
# When logging at debug level, we want to check if debug logging is enabled to 
avoid string creation and concat. 
# I notice that we are using a float for virtual_cores. Do we anticipate using 
this value in any calculations? If yes, should we change this to be millivcores 
and int instead to avoid those floating point operations. Given this is just 
tracking utilization, I suspect we ll do any calculations. 
# In ContainerMonitorsImpl, we save utilization and then set container metrics. 
Should we leave this as is? Or, link them up so that the ContainerMonitorsImpl 
is aware of only one of them? 

> NM should report resource utilization of running containers to RM in heartbeat
> --
>
> Key: YARN-1012
> URL: https://issues.apache.org/jira/browse/YARN-1012
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Arun C Murthy
>Assignee: Inigo Goiri
> Attachments: YARN-1012-1.patch, YARN-1012-2.patch, YARN-1012-3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs

2015-05-20 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553643#comment-14553643
 ] 

Sunil G commented on YARN-2005:
---

Hi [~adhoot]
I have started working on this a lil bit earlier, and made analysis on same. 
Please feel free to start working, and I could help you with the reviews. If 
any other sub-parts work is needed for finishing same, please let me know, I 
could give u a hand. 
Thank you.

> Blacklisting support for scheduling AMs
> ---
>
> Key: YARN-2005
> URL: https://issues.apache.org/jira/browse/YARN-2005
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Jason Lowe
>Assignee: Anubhav Dhoot
>
> It would be nice if the RM supported blacklisting a node for an AM launch 
> after the same node fails a configurable number of AM attempts.  This would 
> be similar to the blacklisting support for scheduling task attempts in the 
> MapReduce AM but for scheduling AM attempts on the RM side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation

2015-05-20 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553636#comment-14553636
 ] 

zhihai xu commented on YARN-3655:
-

thanks [~asuresh] for the review. I think the flip-flop won't happen.
bq. At some time T2, the next allocation event (after all nodes have sent 
heartbeat.. or after a continuousScheduling attempt) happens, a reservation of 
2GB is made on each node for appX.
The above reservation won't succeed because maxAMShare limitation.
If it succeeded, then the reservation for appX won't be removed.

thanks [~kasha] for your review. these are great suggestions.
I made the change based on your suggestions. Also I fixed fitsInMaxShare issue 
in this JIRA instead of creating a follow-up JIRA.
I also did some optimizations to remove some duplicate logic.
I find hasContainerForNode already covered getTotalRequiredResources.
If we check hasContainerForNode, then we don't check getTotalRequiredResources.
So I remove getTotalRequiredResources check in assignReservedContainer and 
assignContainer.
Also because okToUnreserve checked hasContainerForNode, we don't need to check 
it again for reserved container in assignContainer.
I uploaded a new patch YARN-3655.002.patch with above change.

> FairScheduler: potential livelock due to maxAMShare limitation and container 
> reservation 
> -
>
> Key: YARN-3655
> URL: https://issues.apache.org/jira/browse/YARN-3655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3655.000.patch, YARN-3655.001.patch, 
> YARN-3655.002.patch
>
>
> FairScheduler: potential livelock due to maxAMShare limitation and container 
> reservation.
> If a node is reserved by an application, all the other applications don't 
> have any chance to assign a new container on this node, unless the 
> application which reserves the node assigns a new container on this node or 
> releases the reserved container on this node.
> The problem is if an application tries to call assignReservedContainer and 
> fail to get a new container due to maxAMShare limitation, it will block all 
> other applications to use the nodes it reserves. If all other running 
> applications can't release their AM containers due to being blocked by these 
> reserved containers. A livelock situation can happen.
> The following is the code at FSAppAttempt#assignContainer which can cause 
> this potential livelock.
> {code}
> // Check the AM resource usage for the leaf queue
> if (!isAmRunning() && !getUnmanagedAM()) {
>   List ask = appSchedulingInfo.getAllResourceRequests();
>   if (ask.isEmpty() || !getQueue().canRunAppAM(
>   ask.get(0).getCapability())) {
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Skipping allocation because maxAMShare limit would " +
>   "be exceeded");
> }
> return Resources.none();
>   }
> }
> {code}
> To fix this issue, we can unreserve the node if we can't allocate the AM 
> container on the node due to Max AM share limitation and the node is reserved 
> by the application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation

2015-05-20 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3655:

Attachment: YARN-3655.002.patch

> FairScheduler: potential livelock due to maxAMShare limitation and container 
> reservation 
> -
>
> Key: YARN-3655
> URL: https://issues.apache.org/jira/browse/YARN-3655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3655.000.patch, YARN-3655.001.patch, 
> YARN-3655.002.patch
>
>
> FairScheduler: potential livelock due to maxAMShare limitation and container 
> reservation.
> If a node is reserved by an application, all the other applications don't 
> have any chance to assign a new container on this node, unless the 
> application which reserves the node assigns a new container on this node or 
> releases the reserved container on this node.
> The problem is if an application tries to call assignReservedContainer and 
> fail to get a new container due to maxAMShare limitation, it will block all 
> other applications to use the nodes it reserves. If all other running 
> applications can't release their AM containers due to being blocked by these 
> reserved containers. A livelock situation can happen.
> The following is the code at FSAppAttempt#assignContainer which can cause 
> this potential livelock.
> {code}
> // Check the AM resource usage for the leaf queue
> if (!isAmRunning() && !getUnmanagedAM()) {
>   List ask = appSchedulingInfo.getAllResourceRequests();
>   if (ask.isEmpty() || !getQueue().canRunAppAM(
>   ask.get(0).getCapability())) {
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Skipping allocation because maxAMShare limit would " +
>   "be exceeded");
> }
> return Resources.none();
>   }
> }
> {code}
> To fix this issue, we can unreserve the node if we can't allocate the AM 
> container on the node due to Max AM share limitation and the node is reserved 
> by the application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3645) ResourceManager can't start success if attribute value of "aclSubmitApps" is null in fair-scheduler.xml

2015-05-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553598#comment-14553598
 ] 

Hadoop QA commented on YARN-3645:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 43s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 33s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 41s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 14s | The applied patch generated  5 
new checkstyle issues (total was 27, now 28). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 38s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 16s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  50m  8s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  87m 14s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12734306/YARN-3645.1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 6329bd0 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8036/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8036/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8036/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8036/console |


This message was automatically generated.

> ResourceManager can't start success if  attribute value of "aclSubmitApps" is 
> null in fair-scheduler.xml
> 
>
> Key: YARN-3645
> URL: https://issues.apache.org/jira/browse/YARN-3645
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.5.2
>Reporter: zhoulinlin
> Attachments: YARN-3645.1.patch, YARN-3645.patch
>
>
> The "aclSubmitApps" is configured in fair-scheduler.xml like below:
> 
> 
>  
> The resourcemanager log:
> 2015-05-14 12:59:48,623 INFO org.apache.hadoop.service.AbstractService: 
> Service ResourceManager failed in state INITED; cause: 
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: Failed 
> to initialize FairScheduler
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: Failed 
> to initialize FairScheduler
>   at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:493)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:920)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:240)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1159)
> Caused by: java.io.IOException: Failed to initialize FairScheduler
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1301)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1318)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   ... 7 more
> Caused by: java.lang.NullPointerException
>   a

[jira] [Assigned] (YARN-3692) Allow REST API to set a user generated message when killing an application

2015-05-20 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith reassigned YARN-3692:


Assignee: Rohith

> Allow REST API to set a user generated message when killing an application
> --
>
> Key: YARN-3692
> URL: https://issues.apache.org/jira/browse/YARN-3692
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Rajat Jain
>Assignee: Rohith
>
> Currently YARN's REST API supports killing an application without setting a 
> diagnostic message. It would be good to provide that support.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3614) FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash

2015-05-20 Thread lachisis (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lachisis updated YARN-3614:
---
Attachment: YARN-3614-1.patch

> FileSystemRMStateStore throw exception when failed to remove application, 
> that cause resourcemanager to crash
> -
>
> Key: YARN-3614
> URL: https://issues.apache.org/jira/browse/YARN-3614
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.0, 2.7.0
>Reporter: lachisis
>Priority: Critical
>  Labels: patch
> Fix For: 2.7.1
>
> Attachments: YARN-3614-1.patch
>
>
> FileSystemRMStateStore is only a accessorial plug-in of rmstore. 
> When it failed to remove application, I think warning is enough, but now 
> resourcemanager crashed.
> Recently, I configure 
> "yarn.resourcemanager.state-store.max-completed-applications"  to limit 
> applications number in rmstore. when applications number exceed the limit, 
> some old applications will be removed. If failed to remove, resourcemanager 
> will crash.
> The following is log: 
> 2015-05-11 06:58:43,815 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Removing 
> info for app: application_1430994493305_0053
> 2015-05-11 06:58:43,815 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
>  Removing info for app: application_1430994493305_0053 at: 
> /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
> 2015-05-11 06:58:43,816 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
> removing app: application_1430994493305_0053
> java.lang.Exception: Failed to delete 
> /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:879)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:874)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> at java.lang.Thread.run(Thread.java:745)
> 2015-05-11 06:58:43,819 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
> org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
> STATE_STORE_OP_FAILED. Cause:
> java.lang.Exception: Failed to delete 
> /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMa

[jira] [Updated] (YARN-3645) ResourceManager can't start success if attribute value of "aclSubmitApps" is null in fair-scheduler.xml

2015-05-20 Thread Gabor Liptak (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Liptak updated YARN-3645:
---
Attachment: YARN-3645.1.patch

> ResourceManager can't start success if  attribute value of "aclSubmitApps" is 
> null in fair-scheduler.xml
> 
>
> Key: YARN-3645
> URL: https://issues.apache.org/jira/browse/YARN-3645
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.5.2
>Reporter: zhoulinlin
> Attachments: YARN-3645.1.patch, YARN-3645.patch
>
>
> The "aclSubmitApps" is configured in fair-scheduler.xml like below:
> 
> 
>  
> The resourcemanager log:
> 2015-05-14 12:59:48,623 INFO org.apache.hadoop.service.AbstractService: 
> Service ResourceManager failed in state INITED; cause: 
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: Failed 
> to initialize FairScheduler
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: Failed 
> to initialize FairScheduler
>   at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:493)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:920)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:240)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1159)
> Caused by: java.io.IOException: Failed to initialize FairScheduler
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1301)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1318)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   ... 7 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:458)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:337)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1299)
>   ... 9 more
> 2015-05-14 12:59:48,623 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning 
> to standby state
> 2015-05-14 12:59:48,623 INFO 
> com.zte.zdh.platformplugin.factory.YarnPlatformPluginProxyFactory: plugin 
> transitionToStandbyIn
> 2015-05-14 12:59:48,623 WARN org.apache.hadoop.service.AbstractService: When 
> stopping the service ResourceManager : java.lang.NullPointerException
> java.lang.NullPointerException
>   at 
> com.zte.zdh.platformplugin.factory.YarnPlatformPluginProxyFactory.transitionToStandbyIn(YarnPlatformPluginProxyFactory.java:71)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToStandby(ResourceManager.java:997)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStop(ResourceManager.java:1058)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1159)
> 2015-05-14 12:59:48,623 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: Failed 
> to initialize FairScheduler
>   at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.

[jira] [Commented] (YARN-3675) FairScheduler: RM quits when node removal races with continousscheduling on the same node

2015-05-20 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553482#comment-14553482
 ] 

Anubhav Dhoot commented on YARN-3675:
-

Failure does not repro locally for me and seems unrelated

> FairScheduler: RM quits when node removal races with continousscheduling on 
> the same node
> -
>
> Key: YARN-3675
> URL: https://issues.apache.org/jira/browse/YARN-3675
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3675.001.patch, YARN-3675.002.patch, 
> YARN-3675.003.patch
>
>
> With continuous scheduling, scheduling can be done on a node thats just 
> removed causing errors like below.
> {noformat}
> 12:28:53.782 AM FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
> Error in handling event type APP_ATTEMPT_REMOVED to the scheduler
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.unreserve(FSAppAttempt.java:469)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:815)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:763)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1217)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
>   at java.lang.Thread.run(Thread.java:745)
> 12:28:53.783 AMINFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Exiting, bbye..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3480) Recovery may get very slow with lots of services with lots of app-attempts

2015-05-20 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553451#comment-14553451
 ] 

Jun Gong commented on YARN-3480:


{quote}
Without doing this, we will unnecessarily be forcing apps to lose history 
simply because the platform cannot recover quickly enough.
Thinking more, how about we only have (limits + asynchronous recovery) for 
services, once YARN-1039 goes in? Non-service apps anyways are not expected to 
have a lot of app-attempts.
{quote}

It is reasonable. I will update the patch once YARN-1039 goes in.

> Recovery may get very slow with lots of services with lots of app-attempts
> --
>
> Key: YARN-3480
> URL: https://issues.apache.org/jira/browse/YARN-3480
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-3480.01.patch, YARN-3480.02.patch, 
> YARN-3480.03.patch, YARN-3480.04.patch
>
>
> When RM HA is enabled and running containers are kept across attempts, apps 
> are more likely to finish successfully with more retries(attempts), so it 
> will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However 
> it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make 
> RM recover process much slower. It might be better to set max attempts to be 
> stored in RMStateStore.
> BTW: When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to 
> a small value, retried attempts might be very large. So we need to delete 
> some attempts stored in RMStateStore and RMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3654) ContainerLogsPage web UI should not have meta-refresh

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553424#comment-14553424
 ] 

Hudson commented on YARN-3654:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7877 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7877/])
YARN-3654. ContainerLogsPage web UI should not have meta-refresh. Contributed 
by Xuan Gong (jianhe: rev 6329bd00fa1f17cc9555efa496ea7607ad93e0ce)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NMController.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NMWebAppFilter.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/ContainerLogsPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/WebServer.java


> ContainerLogsPage web UI should not have meta-refresh
> -
>
> Key: YARN-3654
> URL: https://issues.apache.org/jira/browse/YARN-3654
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.1
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-3654.1.patch, YARN-3654.2.patch
>
>
> Currently, When we try to find the container logs for the finished 
> application, it will re-direct to the url which we re-configured for 
> yarn.log.server.url in yarn-site.xml. But in ContainerLogsPage, we are using 
> meta-refresh:
> {code}
> set(TITLE, join("Redirecting to log server for ", $(CONTAINER_ID)));
> html.meta_http("refresh", "1; url=" + redirectUrl);
> {code}
> which is not good for some browsers which need to enable the meta-refresh in 
> their security setting, especially for IE which meta-refresh is considered a 
> security hole.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3505) Node's Log Aggregation Report with SUCCEED should not cached in RMApps

2015-05-20 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-3505:

Target Version/s: 2.8.0  (was: 2.8.0, 2.7.1)

> Node's Log Aggregation Report with SUCCEED should not cached in RMApps
> --
>
> Key: YARN-3505
> URL: https://issues.apache.org/jira/browse/YARN-3505
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation
>Affects Versions: 2.8.0
>Reporter: Junping Du
>Assignee: Xuan Gong
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: YARN-3505.1.patch, YARN-3505.2.patch, 
> YARN-3505.2.rebase.patch, YARN-3505.3.patch, YARN-3505.4.patch, 
> YARN-3505.5.patch, YARN-3505.6.patch, YARN-3505.addendum.patch
>
>
> Per discussions in YARN-1402, we shouldn't cache all node's log aggregation 
> reports in RMApps for always, especially for those finished with SUCCEED.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined

2015-05-20 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553384#comment-14553384
 ] 

Karthik Kambatla commented on YARN-2942:


Thanks everyone for the discussion. Clearly, there are trade-offs to make 
between (1) a single aggregation across nodes for an application with a 
slightly higher chance of losing a container's logs if a node were to go down 
vs (2) a two-step aggregation that places more load on HDFS. While looking at 
this trade-off, we should consider HDFS state today and possible improvements 
in the future. If HDFS were to support concurrent-append, option 1 seems like a 
better approach. 



> Aggregated Log Files should be combined
> ---
>
> Key: YARN-2942
> URL: https://issues.apache.org/jira/browse/YARN-2942
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
> CombinedAggregatedLogsProposal_v6.pdf, CombinedAggregatedLogsProposal_v7.pdf, 
> CompactedAggregatedLogsProposal_v1.pdf, 
> CompactedAggregatedLogsProposal_v2.pdf, 
> ConcatableAggregatedLogsProposal_v4.pdf, 
> ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, 
> YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
> YARN-2942.003.patch
>
>
> Turning on log aggregation allows users to easily store container logs in 
> HDFS and subsequently view them in the YARN web UIs from a central place.  
> Currently, there is a separate log file for each Node Manager.  This can be a 
> problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
> accumulating many (possibly small) files per YARN application.  The current 
> “solution” for this problem is to configure YARN (actually the JHS) to 
> automatically delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into 
> one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3609) Move load labels from storage from serviceInit to serviceStart to make it works with RM HA case.

2015-05-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553377#comment-14553377
 ] 

Hadoop QA commented on YARN-3609:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12734279/YARN-3609.3.branch-2.7.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 8966d42 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8035/console |


This message was automatically generated.

> Move load labels from storage from serviceInit to serviceStart to make it 
> works with RM HA case.
> 
>
> Key: YARN-3609
> URL: https://issues.apache.org/jira/browse/YARN-3609
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-3609.1.preliminary.patch, YARN-3609.2.patch, 
> YARN-3609.3.branch-2.7.patch, YARN-3609.3.patch
>
>
> Now RMNodeLabelsManager loads label when serviceInit, but 
> RMActiveService.start() is called when RM HA transition happens.
> We haven't done this before because queue's initialization happens in 
> serviceInit as well, we need make sure labels added to system before init 
> queue, after YARN-2918, we should be able to do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3609) Move load labels from storage from serviceInit to serviceStart to make it works with RM HA case.

2015-05-20 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3609:
-
Attachment: YARN-3609.3.branch-2.7.patch

Attached branch-2.7 patch.

> Move load labels from storage from serviceInit to serviceStart to make it 
> works with RM HA case.
> 
>
> Key: YARN-3609
> URL: https://issues.apache.org/jira/browse/YARN-3609
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-3609.1.preliminary.patch, YARN-3609.2.patch, 
> YARN-3609.3.branch-2.7.patch, YARN-3609.3.patch
>
>
> Now RMNodeLabelsManager loads label when serviceInit, but 
> RMActiveService.start() is called when RM HA transition happens.
> We haven't done this before because queue's initialization happens in 
> serviceInit as well, we need make sure labels added to system before init 
> queue, after YARN-2918, we should be able to do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-20 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553369#comment-14553369
 ] 

Li Lu commented on YARN-3411:
-

I looked at the latest patch and think it's in a good shape for performance 
benchmarks. I've also ran it with a local single node hbase cluster, and it 
worked fine with our performance benchmark application as well as the PI sample 
application. 

> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
> YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
> YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
> YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
> YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
> YARN-3411.poc.7.txt, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-20 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553370#comment-14553370
 ] 

Sangjin Lee commented on YARN-3411:
---

The latest patch LGTM. I'm fine with having a follow-up JIRA to address 
Junping's comment (and other minor issues if any). Once everyone chimes in and 
gives it +1, I'd be happy to commit this patch.

> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
> YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
> YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
> YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
> YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
> YARN-3411.poc.7.txt, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-05-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553364#comment-14553364
 ] 

Hadoop QA commented on YARN-3051:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 55s | Pre-patch YARN-2928 compilation 
is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   7m 42s | There were no new javac warning 
messages. |
| {color:red}-1{color} | javadoc |   9m 39s | The applied patch generated  6  
additional warning messages. |
| {color:red}-1{color} | release audit |   0m 19s | The applied patch generated 
2 release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 19s | The applied patch generated  
23 new checkstyle issues (total was 234, now 257). |
| {color:green}+1{color} | shellcheck |   0m  6s | There were no new shellcheck 
(v0.3.3) issues. |
| {color:red}-1{color} | whitespace |   0m  1s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 40s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 41s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   3m 36s | The patch appears to introduce 6 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 25s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 56s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   1m  3s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  43m 47s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-timelineservice |
|  |  Found reliance on default encoding in 
org.apache.hadoop.yarn.server.timelineservice.storage.FileSystemTimelineReaderImpl.getEntities(String,
 String, String, Long, Long, Long, String, Long, Collection, Collection, 
Collection, Collection, Collection, EnumSet):in 
org.apache.hadoop.yarn.server.timelineservice.storage.FileSystemTimelineReaderImpl.getEntities(String,
 String, String, Long, Long, Long, String, Long, Collection, Collection, 
Collection, Collection, Collection, EnumSet): new java.io.FileReader(File)  At 
FileSystemTimelineReaderImpl.java:[line 88] |
|  |  Found reliance on default encoding in 
org.apache.hadoop.yarn.server.timelineservice.storage.FileSystemTimelineReaderImpl.getEntity(String,
 String, String, String, Collection, Collection, Long, Long, EnumSet):in 
org.apache.hadoop.yarn.server.timelineservice.storage.FileSystemTimelineReaderImpl.getEntity(String,
 String, String, String, Collection, Collection, Long, Long, EnumSet): new 
java.io.FileReader(File)  At FileSystemTimelineReaderImpl.java:[line 68] |
| FindBugs | module:hadoop-yarn-common |
|  |  Inconsistent synchronization of 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.builder;
 locked 92% of time  Unsynchronized access at AllocateResponsePBImpl.java:92% 
of time  Unsynchronized access at AllocateResponsePBImpl.java:[line 391] |
|  |  Inconsistent synchronization of 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.proto;
 locked 94% of time  Unsynchronized access at AllocateResponsePBImpl.java:94% 
of time  Unsynchronized access at AllocateResponsePBImpl.java:[line 391] |
|  |  Inconsistent synchronization of 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.viaProto;
 locked 94% of time  Unsynchronized access at AllocateResponsePBImpl.java:94% 
of time  Unsynchronized access at AllocateResponsePBImpl.java:[line 391] |
| FindBugs | module:hadoop-yarn-api |
|  |  
org.apache.hadoop.yarn.api.records.timelineservice.TimelineMetric$1.compare(Long,
 Long) negates the return value of Long.compareTo(Long)  At 
TimelineMetric.java:value of Long.compareTo(Long)  At TimelineMetric.java:[line 
47] |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12734255/YARN-3051-YARN-2928.03.patch
 |
| Optional Tests | shellcheck javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 463e070 |
| javadoc | 
https://builds.apache.org/job/PreCommit-YARN-Build/8034/artifact/patchprocess/diffJavadocWarnings.txt
 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/8034/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8034/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8034/artifact/patchprocess/whitespace.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit

[jira] [Commented] (YARN-3609) Move load labels from storage from serviceInit to serviceStart to make it works with RM HA case.

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553359#comment-14553359
 ] 

Hudson commented on YARN-3609:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7876 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7876/])
YARN-3609. Load node labels from storage inside RM serviceStart. Contributed by 
Wangda Tan (jianhe: rev 8966d4217969eb71767ba83a3ff2b5bb38189b19)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestFileSystemNodeLabelsStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/RMHATestBase.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHAForNodeLabels.java


> Move load labels from storage from serviceInit to serviceStart to make it 
> works with RM HA case.
> 
>
> Key: YARN-3609
> URL: https://issues.apache.org/jira/browse/YARN-3609
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-3609.1.preliminary.patch, YARN-3609.2.patch, 
> YARN-3609.3.patch
>
>
> Now RMNodeLabelsManager loads label when serviceInit, but 
> RMActiveService.start() is called when RM HA transition happens.
> We haven't done this before because queue's initialization happens in 
> serviceInit as well, we need make sure labels added to system before init 
> queue, after YARN-2918, we should be able to do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-20 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553352#comment-14553352
 ] 

Vrushali C commented on YARN-3411:
--

bq. Or we cannot diff value with null and real 0.

Hi Junping,
So, currently, you are right, we can't differentiate between nulls and real 0.  
In hRaven, we use 0 in case of nulls for long or int values. But for things 
like timestamps, we need stricter checks. After the performance test, I will 
file a jira to ensure we handle this more carefully and return null (Long 
object) in case it's actually null. Hope that is fine. 

thanks
Vrushali

> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
> YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
> YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
> YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
> YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
> YARN-3411.poc.7.txt, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-20 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553353#comment-14553353
 ] 

Vrushali C commented on YARN-3411:
--

bq. Or we cannot diff value with null and real 0.

Hi Junping,
So, currently, you are right, we can't differentiate between nulls and real 0.  
In hRaven, we use 0 in case of nulls for long or int values. But for things 
like timestamps, we need stricter checks. After the performance test, I will 
file a jira to ensure we handle this more carefully and return null (Long 
object) in case it's actually null. Hope that is fine. 

thanks
Vrushali

> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
> YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
> YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
> YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
> YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
> YARN-3411.poc.7.txt, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3647) RMWebServices api's should use updated api from CommonNodeLabelsManager to get NodeLabel object

2015-05-20 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553323#comment-14553323
 ] 

Wangda Tan commented on YARN-3647:
--

Latest patch LGTM.

> RMWebServices api's should use updated api from CommonNodeLabelsManager to 
> get NodeLabel object
> ---
>
> Key: YARN-3647
> URL: https://issues.apache.org/jira/browse/YARN-3647
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-3647.patch, 0002-YARN-3647.patch
>
>
> After YARN-3579, RMWebServices apis can use the updated version of apis in 
> CommonNodeLabelsManager which gives full NodeLabel object instead of creating 
> NodeLabel object from plain label name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553302#comment-14553302
 ] 

Hadoop QA commented on YARN-3411:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m  4s | Pre-patch YARN-2928 compilation 
is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 45s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 44s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 15s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 41s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 41s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 38s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 14s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  37m 30s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12734244/YARN-3411-YARN-2928.007.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 463e070 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8033/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8033/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8033/console |


This message was automatically generated.

> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
> YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
> YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
> YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
> YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
> YARN-3411.poc.7.txt, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2556) Tool to measure the performance of the timeline server

2015-05-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553284#comment-14553284
 ] 

Hadoop QA commented on YARN-2556:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   6m 53s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 6 new or modified test files. |
| {color:green}+1{color} | javac |   9m 47s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 31s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 19s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   2m  2s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 39s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 51s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | mapreduce tests |  99m 47s | Tests failed in 
hadoop-mapreduce-client-jobclient. |
| | | 120m 54s | |
\\
\\
|| Reason || Tests ||
| Timed out tests | org.apache.hadoop.mapred.TestMerge |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12734234/YARN-2556.10.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / 03f897f |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8031/artifact/patchprocess/whitespace.txt
 |
| hadoop-mapreduce-client-jobclient test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8031/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8031/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8031/console |


This message was automatically generated.

> Tool to measure the performance of the timeline server
> --
>
> Key: YARN-2556
> URL: https://issues.apache.org/jira/browse/YARN-2556
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Jonathan Eagles
>Assignee: Chang Li
>  Labels: BB2015-05-TBR
> Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, 
> YARN-2556.1.patch, YARN-2556.10.patch, YARN-2556.2.patch, YARN-2556.3.patch, 
> YARN-2556.4.patch, YARN-2556.5.patch, YARN-2556.6.patch, YARN-2556.7.patch, 
> YARN-2556.8.patch, YARN-2556.9.patch, YARN-2556.patch, yarn2556.patch, 
> yarn2556.patch, yarn2556_wip.patch
>
>
> We need to be able to understand the capacity model for the timeline server 
> to give users the tools they need to deploy a timeline server with the 
> correct capacity.
> I propose we create a mapreduce job that can measure timeline server write 
> and read performance. Transactions per second, I/O for both read and write 
> would be a good start.
> This could be done as an example or test job that could be tied into gridmix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-05-20 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3051:
---
Attachment: YARN-3051-YARN-2928.03.patch

> [Storage abstraction] Create backing storage read interface for ATS readers
> ---
>
> Key: YARN-3051
> URL: https://issues.apache.org/jira/browse/YARN-3051
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-3051-YARN-2928.03.patch, 
> YARN-3051.wip.02.YARN-2928.patch, YARN-3051.wip.patch, YARN-3051_temp.patch
>
>
> Per design in YARN-2928, create backing storage read interface that can be 
> implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3681) yarn cmd says "could not find main class 'queue'" in windows

2015-05-20 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553189#comment-14553189
 ] 

Xuan Gong commented on YARN-3681:
-

Committed into trunk/branch-2/branch-2.7. Thanks, craig and varun

> yarn cmd says "could not find main class 'queue'" in windows
> 
>
> Key: YARN-3681
> URL: https://issues.apache.org/jira/browse/YARN-3681
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.0
> Environment: Windows Only
>Reporter: Sumana Sathish
>Assignee: Varun Saxena
>Priority: Blocker
>  Labels: windows, yarn-client
> Fix For: 2.7.1
>
> Attachments: YARN-3681.0.patch, YARN-3681.01.patch, 
> YARN-3681.1.patch, YARN-3681.branch-2.0.patch, yarncmd.png
>
>
> Attached the screenshot of the command prompt in windows running yarn queue 
> command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-20 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553186#comment-14553186
 ] 

Li Lu commented on YARN-3411:
-

Hi [~vrushalic], sure, don't worry about the test code clean up for now. I'll 
try it locally. 

> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
> YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
> YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
> YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
> YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
> YARN-3411.poc.7.txt, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-20 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-3411:
-
Attachment: YARN-3411-YARN-2928.007.patch


Uploading YARN-3411-YARN-2928.007.patch. I think I have addressed everyone's 
comments. I have been going up and down scrolling on this jira page since 
yesterday and I hope I have not missed out on any comment. 

[~gtCarrera9] I have not yet moved the test data into TestTimelineWriterImpl 
since it has almost a similar information setup for timeline entity but with 
more cases. I can modify it later. I have tested the HBase writer with 
Sangjin's driver code as well. 

> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
> YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
> YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
> YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
> YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
> YARN-3411.poc.7.txt, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3681) yarn cmd says "could not find main class 'queue'" in windows

2015-05-20 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-3681:
--
Attachment: YARN-3681.branch-2.0.patch

Here is one for branch-2

> yarn cmd says "could not find main class 'queue'" in windows
> 
>
> Key: YARN-3681
> URL: https://issues.apache.org/jira/browse/YARN-3681
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.0
> Environment: Windows Only
>Reporter: Sumana Sathish
>Assignee: Varun Saxena
>Priority: Blocker
>  Labels: windows, yarn-client
> Attachments: YARN-3681.0.patch, YARN-3681.01.patch, 
> YARN-3681.1.patch, YARN-3681.branch-2.0.patch, yarncmd.png
>
>
> Attached the screenshot of the command prompt in windows running yarn queue 
> command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3388) Allocation in LeafQueue could get stuck because DRF calculator isn't well supported when computing user-limit

2015-05-20 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553165#comment-14553165
 ] 

Nathan Roberts commented on YARN-3388:
--

Thanks [~leftnoteasy] for the comments. I agree 2b is the way to go. I will 
upload a new patch soon.

> Allocation in LeafQueue could get stuck because DRF calculator isn't well 
> supported when computing user-limit
> -
>
> Key: YARN-3388
> URL: https://issues.apache.org/jira/browse/YARN-3388
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-3388-v0.patch, YARN-3388-v1.patch, 
> YARN-3388-v2.patch
>
>
> When there are multiple active users in a queue, it should be possible for 
> those users to make use of capacity up-to max_capacity (or close). The 
> resources should be fairly distributed among the active users in the queue. 
> This works pretty well when there is a single resource being scheduled.   
> However, when there are multiple resources the situation gets more complex 
> and the current algorithm tends to get stuck at Capacity. 
> Example illustrated in subsequent comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3681) yarn cmd says "could not find main class 'queue'" in windows

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553159#comment-14553159
 ] 

Hudson commented on YARN-3681:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7875 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7875/])
YARN-3681. yarn cmd says "could not find main class 'queue'" in windows. 
(xgong: rev 5774f6b1e577ee64bde8c7c1e39f404b9e651176)
* hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd
* hadoop-yarn-project/CHANGES.txt


> yarn cmd says "could not find main class 'queue'" in windows
> 
>
> Key: YARN-3681
> URL: https://issues.apache.org/jira/browse/YARN-3681
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.0
> Environment: Windows Only
>Reporter: Sumana Sathish
>Assignee: Varun Saxena
>Priority: Blocker
>  Labels: windows, yarn-client
> Attachments: YARN-3681.0.patch, YARN-3681.01.patch, 
> YARN-3681.1.patch, yarncmd.png
>
>
> Attached the screenshot of the command prompt in windows running yarn queue 
> command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2918) Don't fail RM if queue's configured labels are not existed in cluster-node-labels

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553158#comment-14553158
 ] 

Hudson commented on YARN-2918:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7875 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7875/])
Move YARN-2918 from 2.8.0 to 2.7.1 (wangda: rev 
03f897fd1a3779251023bae358207069b89addbf)
* hadoop-yarn-project/CHANGES.txt


> Don't fail RM if queue's configured labels are not existed in 
> cluster-node-labels
> -
>
> Key: YARN-2918
> URL: https://issues.apache.org/jira/browse/YARN-2918
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Rohith
>Assignee: Wangda Tan
> Fix For: 2.8.0, 2.7.1
>
> Attachments: YARN-2918.1.patch, YARN-2918.2.patch, YARN-2918.3.patch
>
>
> Currently, if admin setup labels on queues 
> {{.accessible-node-labels = ...}}. And the label is not added to 
> RM, queue's initialization will fail and RM will fail too:
> {noformat}
> 2014-12-03 20:11:50,126 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> ...
> Caused by: java.io.IOException: NodeLabelManager doesn't include label = x, 
> please check.
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkIfLabelInClusterNodeLabels(SchedulerUtils.java:287)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.(AbstractCSQueue.java:109)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.(LeafQueue.java:120)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:567)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:587)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:462)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:294)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:324)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> {noformat}
> This is not a good user experience, we should stop fail RM so that admin can 
> configure queue/labels in following steps:
> - Configure queue (with label)
> - Start RM
> - Add labels to RM
> - Submit applications
> Now admin has to:
> - Configure queue (without label)
> - Start RM
> - Add labels to RM
> - Refresh queue's config (with label)
> - Submit applications



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3675) FairScheduler: RM quits when node removal races with continousscheduling on the same node

2015-05-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553151#comment-14553151
 ] 

Hadoop QA commented on YARN-3675:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 34s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 31s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 35s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 46s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 16s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  50m  4s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  86m 17s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.server.resourcemanager.TestRMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12734207/YARN-3675.003.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 4aa730c |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8030/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8030/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8030/console |


This message was automatically generated.

> FairScheduler: RM quits when node removal races with continousscheduling on 
> the same node
> -
>
> Key: YARN-3675
> URL: https://issues.apache.org/jira/browse/YARN-3675
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3675.001.patch, YARN-3675.002.patch, 
> YARN-3675.003.patch
>
>
> With continuous scheduling, scheduling can be done on a node thats just 
> removed causing errors like below.
> {noformat}
> 12:28:53.782 AM FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
> Error in handling event type APP_ATTEMPT_REMOVED to the scheduler
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.unreserve(FSAppAttempt.java:469)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:815)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:763)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1217)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
>   at java.lang.Thread.run(Thread.java:745)
> 12:28:53.783 AMINFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Exiting, bbye..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3681) yarn cmd says "could not find main class 'queue'" in windows

2015-05-20 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553114#comment-14553114
 ] 

Xuan Gong commented on YARN-3681:
-

Use git apply -p0 --whitespace=fix could apply the patch.
The patch looks good to me.
+1 will commit

> yarn cmd says "could not find main class 'queue'" in windows
> 
>
> Key: YARN-3681
> URL: https://issues.apache.org/jira/browse/YARN-3681
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.0
> Environment: Windows Only
>Reporter: Sumana Sathish
>Assignee: Varun Saxena
>Priority: Blocker
>  Labels: windows, yarn-client
> Attachments: YARN-3681.0.patch, YARN-3681.01.patch, 
> YARN-3681.1.patch, yarncmd.png
>
>
> Attached the screenshot of the command prompt in windows running yarn queue 
> command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2918) Don't fail RM if queue's configured labels are not existed in cluster-node-labels

2015-05-20 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553086#comment-14553086
 ] 

Wangda Tan commented on YARN-2918:
--

Back-ported this patch to 2.7.1, updating fix version.

> Don't fail RM if queue's configured labels are not existed in 
> cluster-node-labels
> -
>
> Key: YARN-2918
> URL: https://issues.apache.org/jira/browse/YARN-2918
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Rohith
>Assignee: Wangda Tan
> Fix For: 2.8.0, 2.7.1
>
> Attachments: YARN-2918.1.patch, YARN-2918.2.patch, YARN-2918.3.patch
>
>
> Currently, if admin setup labels on queues 
> {{.accessible-node-labels = ...}}. And the label is not added to 
> RM, queue's initialization will fail and RM will fail too:
> {noformat}
> 2014-12-03 20:11:50,126 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> ...
> Caused by: java.io.IOException: NodeLabelManager doesn't include label = x, 
> please check.
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkIfLabelInClusterNodeLabels(SchedulerUtils.java:287)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.(AbstractCSQueue.java:109)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.(LeafQueue.java:120)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:567)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:587)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:462)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:294)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:324)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> {noformat}
> This is not a good user experience, we should stop fail RM so that admin can 
> configure queue/labels in following steps:
> - Configure queue (with label)
> - Start RM
> - Add labels to RM
> - Submit applications
> Now admin has to:
> - Configure queue (without label)
> - Start RM
> - Add labels to RM
> - Refresh queue's config (with label)
> - Submit applications



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2918) Don't fail RM if queue's configured labels are not existed in cluster-node-labels

2015-05-20 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2918:
-
Fix Version/s: 2.7.1

> Don't fail RM if queue's configured labels are not existed in 
> cluster-node-labels
> -
>
> Key: YARN-2918
> URL: https://issues.apache.org/jira/browse/YARN-2918
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Rohith
>Assignee: Wangda Tan
> Fix For: 2.8.0, 2.7.1
>
> Attachments: YARN-2918.1.patch, YARN-2918.2.patch, YARN-2918.3.patch
>
>
> Currently, if admin setup labels on queues 
> {{.accessible-node-labels = ...}}. And the label is not added to 
> RM, queue's initialization will fail and RM will fail too:
> {noformat}
> 2014-12-03 20:11:50,126 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> ...
> Caused by: java.io.IOException: NodeLabelManager doesn't include label = x, 
> please check.
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkIfLabelInClusterNodeLabels(SchedulerUtils.java:287)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.(AbstractCSQueue.java:109)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.(LeafQueue.java:120)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:567)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:587)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:462)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:294)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:324)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> {noformat}
> This is not a good user experience, we should stop fail RM so that admin can 
> configure queue/labels in following steps:
> - Configure queue (with label)
> - Start RM
> - Add labels to RM
> - Submit applications
> Now admin has to:
> - Configure queue (without label)
> - Start RM
> - Add labels to RM
> - Refresh queue's config (with label)
> - Submit applications



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2556) Tool to measure the performance of the timeline server

2015-05-20 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-2556:
---
Attachment: YARN-2556.10.patch

Add JobHistoryFileReplayMapper mapper

> Tool to measure the performance of the timeline server
> --
>
> Key: YARN-2556
> URL: https://issues.apache.org/jira/browse/YARN-2556
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Jonathan Eagles
>Assignee: Chang Li
>  Labels: BB2015-05-TBR
> Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, 
> YARN-2556.1.patch, YARN-2556.10.patch, YARN-2556.2.patch, YARN-2556.3.patch, 
> YARN-2556.4.patch, YARN-2556.5.patch, YARN-2556.6.patch, YARN-2556.7.patch, 
> YARN-2556.8.patch, YARN-2556.9.patch, YARN-2556.patch, yarn2556.patch, 
> yarn2556.patch, yarn2556_wip.patch
>
>
> We need to be able to understand the capacity model for the timeline server 
> to give users the tools they need to deploy a timeline server with the 
> correct capacity.
> I propose we create a mapreduce job that can measure timeline server write 
> and read performance. Transactions per second, I/O for both read and write 
> would be a good start.
> This could be done as an example or test job that could be tied into gridmix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-3691) FairScheduler: Limit number of reservations for a container

2015-05-20 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553001#comment-14553001
 ] 

Karthik Kambatla edited comment on YARN-3691 at 5/20/15 8:09 PM:
-

The number of reservations should be per container and not per application? If 
an app is looking to get resources for 10 containers, it should be able to make 
reservations independently for each container. 


was (Author: kasha):
The number of reservations should be per component and not per application? If 
an app is looking to get resources for 10 containers, it should be able to make 
reservations independently for each container. 

> FairScheduler: Limit number of reservations for a container
> ---
>
> Key: YARN-3691
> URL: https://issues.apache.org/jira/browse/YARN-3691
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>
> Currently, It is possible to reserve resource for an app on all nodes. 
> Limiting this to possibly just a number of nodes (or a ratio of the total 
> cluster size) would improve utilization of the cluster and will reduce the 
> possibility of starving other apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-314) Schedulers should allow resource requests of different sizes at the same priority and location

2015-05-20 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553015#comment-14553015
 ] 

Karthik Kambatla commented on YARN-314:
---

I am essentially proposing an efficient way to index the pending requests 
across multiple axes. Each of these indices are captured by a map. The only 
reason to colocate them is to not disperse this indexing (mapping) logic across 
multiple classes. 

We should able to quickly look up all requests for an app for reporting etc., 
and also look up all node-local requests across applications at schedule time 
without having to iterate through all the applications. 

The maps could be - >>, >>. Current {{AppSchedulingInfo}} 
could stay as is and use the former map to get the corresponding requests.

> Schedulers should allow resource requests of different sizes at the same 
> priority and location
> --
>
> Key: YARN-314
> URL: https://issues.apache.org/jira/browse/YARN-314
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 2.0.2-alpha
>Reporter: Sandy Ryza
> Attachments: yarn-314-prelim.patch
>
>
> Currently, resource requests for the same container and locality are expected 
> to all be the same size.
> While it it doesn't look like it's needed for apps currently, and can be 
> circumvented by specifying different priorities if absolutely necessary, it 
> seems to me that the ability to request containers with different resource 
> requirements at the same priority level should be there for the future and 
> for completeness sake.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3686) CapacityScheduler should trim default_node_label_expression

2015-05-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553000#comment-14553000
 ] 

Hadoop QA commented on YARN-3686:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 29s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 31s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 38s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 24s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 38s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 16s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  50m 20s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  86m 14s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12734160/0002-YARN-3686.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 4aa730c |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8029/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8029/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8029/console |


This message was automatically generated.

> CapacityScheduler should trim default_node_label_expression
> ---
>
> Key: YARN-3686
> URL: https://issues.apache.org/jira/browse/YARN-3686
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-3686.patch, 0002-YARN-3686.patch
>
>
> We should trim default_node_label_expression for queue before using it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3691) FairScheduler: Limit number of reservations for a container

2015-05-20 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553001#comment-14553001
 ] 

Karthik Kambatla commented on YARN-3691:


The number of reservations should be per component and not per application? If 
an app is looking to get resources for 10 containers, it should be able to make 
reservations independently for each container. 

> FairScheduler: Limit number of reservations for a container
> ---
>
> Key: YARN-3691
> URL: https://issues.apache.org/jira/browse/YARN-3691
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>
> Currently, It is possible to reserve resource for an app on all nodes. 
> Limiting this to possibly just a number of nodes (or a ratio of the total 
> cluster size) would improve utilization of the cluster and will reduce the 
> possibility of starving other apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3691) FairScheduler: Limit number of reservations for a container

2015-05-20 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3691:
---
Summary: FairScheduler: Limit number of reservations for a container  (was: 
Limit number of reservations for an app)

> FairScheduler: Limit number of reservations for a container
> ---
>
> Key: YARN-3691
> URL: https://issues.apache.org/jira/browse/YARN-3691
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>
> Currently, It is possible to reserve resource for an app on all nodes. 
> Limiting this to possibly just a number of nodes (or a ratio of the total 
> cluster size) would improve utilization of the cluster and will reduce the 
> possibility of starving other apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3467) Expose allocatedMB, allocatedVCores, and runningContainers metrics on running Applications in RM Web UI

2015-05-20 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552959#comment-14552959
 ] 

Karthik Kambatla commented on YARN-3467:


We should add this information to ApplicationAttempt page, and also preferably 
to the RM Web UI. I have heard asks for both number of containers and allocated 
resources on the RM applications page, so people can sort applications by that. 

> Expose allocatedMB, allocatedVCores, and runningContainers metrics on running 
> Applications in RM Web UI
> ---
>
> Key: YARN-3467
> URL: https://issues.apache.org/jira/browse/YARN-3467
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: webapp, yarn
>Affects Versions: 2.5.0
>Reporter: Anthony Rojas
>Assignee: Anubhav Dhoot
>Priority: Minor
> Attachments: ApplicationAttemptPage.png
>
>
> The YARN REST API can report on the following properties:
> *allocatedMB*: The sum of memory in MB allocated to the application's running 
> containers
> *allocatedVCores*: The sum of virtual cores allocated to the application's 
> running containers
> *runningContainers*: The number of containers currently running for the 
> application
> Currently, the RM Web UI does not report on these items (at least I couldn't 
> find any entries within the Web UI).
> It would be useful for YARN Application and Resource troubleshooting to have 
> these properties and their corresponding values exposed on the RM WebUI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2355) MAX_APP_ATTEMPTS_ENV may no longer be a useful env var for a container

2015-05-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552958#comment-14552958
 ] 

Hadoop QA commented on YARN-2355:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 38s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 32s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 39s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 45s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 39s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 26s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |  50m  1s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  89m 14s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12734179/YARN-2355.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 4aa730c |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8028/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8028/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8028/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8028/console |


This message was automatically generated.

> MAX_APP_ATTEMPTS_ENV may no longer be a useful env var for a container
> --
>
> Key: YARN-2355
> URL: https://issues.apache.org/jira/browse/YARN-2355
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Darrell Taylor
>  Labels: newbie
> Attachments: YARN-2355.001.patch
>
>
> After YARN-2074, YARN-614 and YARN-611, the application cannot judge whether 
> it has the chance to try based on MAX_APP_ATTEMPTS_ENV alone. We should be 
> able to notify the application of the up-to-date remaining retry quota.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3675) FairScheduler: RM quits when node removal races with continousscheduling on the same node

2015-05-20 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3675:

Attachment: YARN-3675.003.patch

Removed spurious changes and changed visibility of attemptScheduling

> FairScheduler: RM quits when node removal races with continousscheduling on 
> the same node
> -
>
> Key: YARN-3675
> URL: https://issues.apache.org/jira/browse/YARN-3675
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3675.001.patch, YARN-3675.002.patch, 
> YARN-3675.003.patch
>
>
> With continuous scheduling, scheduling can be done on a node thats just 
> removed causing errors like below.
> {noformat}
> 12:28:53.782 AM FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
> Error in handling event type APP_ATTEMPT_REMOVED to the scheduler
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.unreserve(FSAppAttempt.java:469)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:815)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:763)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1217)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
>   at java.lang.Thread.run(Thread.java:745)
> 12:28:53.783 AMINFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Exiting, bbye..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2408) Resource Request REST API for YARN

2015-05-20 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552849#comment-14552849
 ] 

Renan DelValle commented on YARN-2408:
--


[~leftnoteasy], thanks for taking a look at the patch, really appreciate it.

1) I agree, the original patch I had was very verbose so I shrunk down the 
amount of data being transferred by clustering resource requests together. 
Seems to be the best alternative to keeping original ResourceRequest structures.

2) I will take a look at that and implement it that way. (Thank you for 
pointing me in the right direction). On the resource-by-label inclusion, do you 
think it would be better to wait until it is patched into the trunk in order to 
make the process easier?


> Resource Request REST API for YARN
> --
>
> Key: YARN-2408
> URL: https://issues.apache.org/jira/browse/YARN-2408
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: webapp
>Reporter: Renan DelValle
>  Labels: features
> Attachments: YARN-2408-6.patch
>
>
> I’m proposing a new REST API for YARN which exposes a snapshot of the 
> Resource Requests that exist inside of the Scheduler. My motivation behind 
> this new feature is to allow external software to monitor the amount of 
> resources being requested to gain more insightful information into cluster 
> usage than is already provided. The API can also be used by external software 
> to detect a starved application and alert the appropriate users and/or sys 
> admin so that the problem may be remedied.
> Here is the proposed API (a JSON counterpart is also available):
> {code:xml}
> 
>   7680
>   7
>   
> application_1412191664217_0001
> 
> appattempt_1412191664217_0001_01
> default
> 6144
> 6
> 3
> 
>   
> 1024
> 1
> 6
> true
> 20
> 
>   localMachine
>   /default-rack
>   *
> 
>   
> 
>   
>   
>   ...
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3675) FairScheduler: RM quits when node removal races with continousscheduling on the same node

2015-05-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552841#comment-14552841
 ] 

Hadoop QA commented on YARN-3675:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 44s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 41s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 38s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 47s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 16s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  50m 51s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  87m 29s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12734156/YARN-3675.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 4aa730c |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8025/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8025/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8025/console |


This message was automatically generated.

> FairScheduler: RM quits when node removal races with continousscheduling on 
> the same node
> -
>
> Key: YARN-3675
> URL: https://issues.apache.org/jira/browse/YARN-3675
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3675.001.patch, YARN-3675.002.patch
>
>
> With continuous scheduling, scheduling can be done on a node thats just 
> removed causing errors like below.
> {noformat}
> 12:28:53.782 AM FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
> Error in handling event type APP_ATTEMPT_REMOVED to the scheduler
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.unreserve(FSAppAttempt.java:469)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:815)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:763)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1217)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
>   at java.lang.Thread.run(Thread.java:745)
> 12:28:53.783 AMINFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Exiting, bbye..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3692) Allow REST API to set a user generated message when killing an application

2015-05-20 Thread Rajat Jain (JIRA)
Rajat Jain created YARN-3692:


 Summary: Allow REST API to set a user generated message when 
killing an application
 Key: YARN-3692
 URL: https://issues.apache.org/jira/browse/YARN-3692
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Rajat Jain


Currently YARN's REST API supports killing an application without setting a 
diagnostic message. It would be good to provide that support.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-314) Schedulers should allow resource requests of different sizes at the same priority and location

2015-05-20 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552800#comment-14552800
 ] 

Wangda Tan commented on YARN-314:
-

[~kasha],
Actually I'm not quite sure about this proposal, what's the benefit of putting 
all apps' requests together comparing to hold one data structure per app, is 
there any use case?

> Schedulers should allow resource requests of different sizes at the same 
> priority and location
> --
>
> Key: YARN-314
> URL: https://issues.apache.org/jira/browse/YARN-314
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 2.0.2-alpha
>Reporter: Sandy Ryza
> Attachments: yarn-314-prelim.patch
>
>
> Currently, resource requests for the same container and locality are expected 
> to all be the same size.
> While it it doesn't look like it's needed for apps currently, and can be 
> circumvented by specifying different priorities if absolutely necessary, it 
> seems to me that the ability to request containers with different resource 
> requirements at the same priority level should be there for the future and 
> for completeness sake.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2355) MAX_APP_ATTEMPTS_ENV may no longer be a useful env var for a container

2015-05-20 Thread Darrell Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Darrell Taylor updated YARN-2355:
-
Attachment: YARN-2355.001.patch

> MAX_APP_ATTEMPTS_ENV may no longer be a useful env var for a container
> --
>
> Key: YARN-2355
> URL: https://issues.apache.org/jira/browse/YARN-2355
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Darrell Taylor
>  Labels: newbie
> Attachments: YARN-2355.001.patch
>
>
> After YARN-2074, YARN-614 and YARN-611, the application cannot judge whether 
> it has the chance to try based on MAX_APP_ATTEMPTS_ENV alone. We should be 
> able to notify the application of the up-to-date remaining retry quota.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3681) yarn cmd says "could not find main class 'queue'" in windows

2015-05-20 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552738#comment-14552738
 ] 

Varun Saxena commented on YARN-3681:


[~cwelch], it has to do with line endings.
I have to run {{unix2dos}} to convert line endings for Jenkins to accept it. 
Windows batch files patches do not always apply depending on settings of line 
endings done by the user. I think my patch did not apply for you because of 
that reason.

> yarn cmd says "could not find main class 'queue'" in windows
> 
>
> Key: YARN-3681
> URL: https://issues.apache.org/jira/browse/YARN-3681
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.0
> Environment: Windows Only
>Reporter: Sumana Sathish
>Assignee: Varun Saxena
>Priority: Blocker
>  Labels: windows, yarn-client
> Attachments: YARN-3681.0.patch, YARN-3681.01.patch, 
> YARN-3681.1.patch, yarncmd.png
>
>
> Attached the screenshot of the command prompt in windows running yarn queue 
> command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3681) yarn cmd says "could not find main class 'queue'" in windows

2015-05-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552736#comment-14552736
 ] 

Hadoop QA commented on YARN-3681:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12734165/YARN-3681.1.patch |
| Optional Tests |  |
| git revision | trunk / 4aa730c |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8027/console |


This message was automatically generated.

> yarn cmd says "could not find main class 'queue'" in windows
> 
>
> Key: YARN-3681
> URL: https://issues.apache.org/jira/browse/YARN-3681
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.0
> Environment: Windows Only
>Reporter: Sumana Sathish
>Assignee: Varun Saxena
>Priority: Blocker
>  Labels: windows, yarn-client
> Attachments: YARN-3681.0.patch, YARN-3681.01.patch, 
> YARN-3681.1.patch, yarncmd.png
>
>
> Attached the screenshot of the command prompt in windows running yarn queue 
> command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3681) yarn cmd says "could not find main class 'queue'" in windows

2015-05-20 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-3681:
--
Attachment: YARN-3681.1.patch

Oh the irony, neither did my own.  Updated to one which does.

> yarn cmd says "could not find main class 'queue'" in windows
> 
>
> Key: YARN-3681
> URL: https://issues.apache.org/jira/browse/YARN-3681
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.0
> Environment: Windows Only
>Reporter: Sumana Sathish
>Assignee: Varun Saxena
>Priority: Blocker
>  Labels: windows, yarn-client
> Attachments: YARN-3681.0.patch, YARN-3681.01.patch, 
> YARN-3681.1.patch, yarncmd.png
>
>
> Attached the screenshot of the command prompt in windows running yarn queue 
> command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3691) Limit number of reservations for an app

2015-05-20 Thread Arun Suresh (JIRA)
Arun Suresh created YARN-3691:
-

 Summary: Limit number of reservations for an app
 Key: YARN-3691
 URL: https://issues.apache.org/jira/browse/YARN-3691
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Arun Suresh


Currently, It is possible to reserve resource for an app on all nodes. Limiting 
this to possibly just a number of nodes (or a ratio of the total cluster size) 
would improve utilization of the cluster and will reduce the possibility of 
starving other apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3691) Limit number of reservations for an app

2015-05-20 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh reassigned YARN-3691:
-

Assignee: Arun Suresh

> Limit number of reservations for an app
> ---
>
> Key: YARN-3691
> URL: https://issues.apache.org/jira/browse/YARN-3691
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>
> Currently, It is possible to reserve resource for an app on all nodes. 
> Limiting this to possibly just a number of nodes (or a ratio of the total 
> cluster size) would improve utilization of the cluster and will reduce the 
> possibility of starving other apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3681) yarn cmd says "could not find main class 'queue'" in windows

2015-05-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552710#comment-14552710
 ] 

Hadoop QA commented on YARN-3681:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12734163/YARN-3681.0.patch |
| Optional Tests |  |
| git revision | trunk / 4aa730c |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8026/console |


This message was automatically generated.

> yarn cmd says "could not find main class 'queue'" in windows
> 
>
> Key: YARN-3681
> URL: https://issues.apache.org/jira/browse/YARN-3681
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.0
> Environment: Windows Only
>Reporter: Sumana Sathish
>Assignee: Varun Saxena
>Priority: Blocker
>  Labels: windows, yarn-client
> Attachments: YARN-3681.0.patch, YARN-3681.01.patch, yarncmd.png
>
>
> Attached the screenshot of the command prompt in windows running yarn queue 
> command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3686) CapacityScheduler should trim default_node_label_expression

2015-05-20 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552711#comment-14552711
 ] 

Wangda Tan commented on YARN-3686:
--

[~sunilg], thanks for working on this, comments:
- I think you can try to add to 
{{org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeNodeLabelExpressionInRequest(ResourceRequest,
 QueueInfo)}}, which needs trim node-label-expression as well
- Actually this is a regression, in 2.6 queue's node label expression with 
spaces can setup without any issue. It's better to add test to make sure 1. 
spaces in resource request will be trimmed 2. spaces in queue configuration 
(default-node-label-expression) will be trimmed.

> CapacityScheduler should trim default_node_label_expression
> ---
>
> Key: YARN-3686
> URL: https://issues.apache.org/jira/browse/YARN-3686
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-3686.patch, 0002-YARN-3686.patch
>
>
> We should trim default_node_label_expression for queue before using it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3467) Expose allocatedMB, allocatedVCores, and runningContainers metrics on running Applications in RM Web UI

2015-05-20 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552699#comment-14552699
 ] 

Anubhav Dhoot commented on YARN-3467:
-

Attaching the ApplicationAttempt page. It does show the number of running 
containers. But it does not show actual allocated resources overall for the 
application attempt. 

> Expose allocatedMB, allocatedVCores, and runningContainers metrics on running 
> Applications in RM Web UI
> ---
>
> Key: YARN-3467
> URL: https://issues.apache.org/jira/browse/YARN-3467
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: webapp, yarn
>Affects Versions: 2.5.0
>Reporter: Anthony Rojas
>Assignee: Anubhav Dhoot
>Priority: Minor
> Attachments: ApplicationAttemptPage.png
>
>
> The YARN REST API can report on the following properties:
> *allocatedMB*: The sum of memory in MB allocated to the application's running 
> containers
> *allocatedVCores*: The sum of virtual cores allocated to the application's 
> running containers
> *runningContainers*: The number of containers currently running for the 
> application
> Currently, the RM Web UI does not report on these items (at least I couldn't 
> find any entries within the Web UI).
> It would be useful for YARN Application and Resource troubleshooting to have 
> these properties and their corresponding values exposed on the RM WebUI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3681) yarn cmd says "could not find main class 'queue'" in windows

2015-05-20 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552700#comment-14552700
 ] 

Craig Welch commented on YARN-3681:
---

[~varun_saxena] the patch you had doesn't apply properly for me, I've uploaded 
a patch which does the same things which does, and which I've had the 
opportunity to test.

@xgong, can you take a look at this one (.0.patch)?  Thanks.

> yarn cmd says "could not find main class 'queue'" in windows
> 
>
> Key: YARN-3681
> URL: https://issues.apache.org/jira/browse/YARN-3681
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.0
> Environment: Windows Only
>Reporter: Sumana Sathish
>Assignee: Varun Saxena
>Priority: Blocker
>  Labels: windows, yarn-client
> Attachments: YARN-3681.0.patch, YARN-3681.01.patch, yarncmd.png
>
>
> Attached the screenshot of the command prompt in windows running yarn queue 
> command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3681) yarn cmd says "could not find main class 'queue'" in windows

2015-05-20 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-3681:
--
Attachment: YARN-3681.0.patch

> yarn cmd says "could not find main class 'queue'" in windows
> 
>
> Key: YARN-3681
> URL: https://issues.apache.org/jira/browse/YARN-3681
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.0
> Environment: Windows Only
>Reporter: Sumana Sathish
>Assignee: Varun Saxena
>Priority: Blocker
>  Labels: windows, yarn-client
> Attachments: YARN-3681.0.patch, YARN-3681.01.patch, yarncmd.png
>
>
> Attached the screenshot of the command prompt in windows running yarn queue 
> command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3467) Expose allocatedMB, allocatedVCores, and runningContainers metrics on running Applications in RM Web UI

2015-05-20 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3467:

Attachment: ApplicationAttemptPage.png

> Expose allocatedMB, allocatedVCores, and runningContainers metrics on running 
> Applications in RM Web UI
> ---
>
> Key: YARN-3467
> URL: https://issues.apache.org/jira/browse/YARN-3467
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: webapp, yarn
>Affects Versions: 2.5.0
>Reporter: Anthony Rojas
>Assignee: Anubhav Dhoot
>Priority: Minor
> Attachments: ApplicationAttemptPage.png
>
>
> The YARN REST API can report on the following properties:
> *allocatedMB*: The sum of memory in MB allocated to the application's running 
> containers
> *allocatedVCores*: The sum of virtual cores allocated to the application's 
> running containers
> *runningContainers*: The number of containers currently running for the 
> application
> Currently, the RM Web UI does not report on these items (at least I couldn't 
> find any entries within the Web UI).
> It would be useful for YARN Application and Resource troubleshooting to have 
> these properties and their corresponding values exposed on the RM WebUI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be

2015-05-20 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552686#comment-14552686
 ] 

Craig Welch commented on YARN-3626:
---

Checkstyle looks insignificant.

[~cnauroth], [~vinodkv], I've changed the approach to use the environment 
instead of configuration as suggested, can one of you review pls?

> On Windows localized resources are not moved to the front of the classpath 
> when they should be
> --
>
> Key: YARN-3626
> URL: https://issues.apache.org/jira/browse/YARN-3626
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
> Environment: Windows
>Reporter: Craig Welch
>Assignee: Craig Welch
> Fix For: 2.7.1
>
> Attachments: YARN-3626.0.patch, YARN-3626.11.patch, 
> YARN-3626.14.patch, YARN-3626.4.patch, YARN-3626.6.patch, YARN-3626.9.patch
>
>
> In response to the mapreduce.job.user.classpath.first setting the classpath 
> is ordered differently so that localized resources will appear before system 
> classpath resources when tasks execute.  On Windows this does not work 
> because the localized resources are not linked into their final location when 
> the classpath jar is created.  To compensate for that localized jar resources 
> are added directly to the classpath generated for the jar rather than being 
> discovered from the localized directories.  Unfortunately, they are always 
> appended to the classpath, and so are never preferred over system resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3685) NodeManager unnecessarily knows about classpath-jars due to Windows limitations

2015-05-20 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552680#comment-14552680
 ] 

Chris Nauroth commented on YARN-3685:
-

[~vinodkv], thanks for the notification.  I was not aware of this design goal 
at the time of YARN-316.

Perhaps it's possible to move the classpath jar generation to the MR client or 
AM.  It's not immediately obvious to me which of those 2 choices is better.  
We'd need to change the manifest to use relative paths in the Class-Path 
attribute instead of absolute paths.  (The client and AM are not aware of the 
exact layout of the NodeManager's {{yarn.nodemanager.local-dirs}}, so the 
client can't predict the absolute paths at time of container launch.)

There is one piece of logic that I don't see how to handle though.  Some 
classpath entries are defined in terms of environment variables.  These 
environment variables are expanded at the NodeManager via the container launch 
scripts.  This was true of Linux even before YARN-316, so in that sense, YARN 
did already have some classpath logic indirectly.  Environment variables cannot 
be used inside a manifest's Class-Path, so for Windows, NodeManager expands the 
environment variables before populating Class-Path.  It would be incorrect to 
do the environment variable expansion at the MR client, because it might be 
running with different configuration than the NodeManager.  I suppose if the AM 
did the expansion, then that would work in most cases, but it creates an 
assumption that the AM container is running with configuration that matches all 
NodeManagers in the cluster.  I don't believe that assumption exists today.

If we do move classpath handling out of the NodeManager, then it would be a 
backwards-incompatible change, and so it could not be shipped in the 2.x 
release line.

> NodeManager unnecessarily knows about classpath-jars due to Windows 
> limitations
> ---
>
> Key: YARN-3685
> URL: https://issues.apache.org/jira/browse/YARN-3685
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>
> Found this while looking at cleaning up ContainerExecutor via YARN-3648, 
> making it a sub-task.
> YARN *should not* know about classpaths. Our original design modeled around 
> this. But when we added windows suppport, due to classpath issues, we ended 
> up breaking this abstraction via YARN-316. We should clean this up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3686) CapacityScheduler should trim default_node_label_expression

2015-05-20 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3686:
--
Attachment: 0002-YARN-3686.patch

Uploading another patch covering a negative scenario.

> CapacityScheduler should trim default_node_label_expression
> ---
>
> Key: YARN-3686
> URL: https://issues.apache.org/jira/browse/YARN-3686
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-3686.patch, 0002-YARN-3686.patch
>
>
> We should trim default_node_label_expression for queue before using it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3675) FairScheduler: RM quits when node removal races with continousscheduling on the same node

2015-05-20 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3675:

Attachment: YARN-3675.002.patch

Fixed checkstyle issue 

> FairScheduler: RM quits when node removal races with continousscheduling on 
> the same node
> -
>
> Key: YARN-3675
> URL: https://issues.apache.org/jira/browse/YARN-3675
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3675.001.patch, YARN-3675.002.patch
>
>
> With continuous scheduling, scheduling can be done on a node thats just 
> removed causing errors like below.
> {noformat}
> 12:28:53.782 AM FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
> Error in handling event type APP_ATTEMPT_REMOVED to the scheduler
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.unreserve(FSAppAttempt.java:469)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:815)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:763)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1217)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
>   at java.lang.Thread.run(Thread.java:745)
> 12:28:53.783 AMINFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Exiting, bbye..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs

2015-05-20 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552669#comment-14552669
 ] 

Anubhav Dhoot commented on YARN-2005:
-

Assigning to myself to as I am starting work on this. [~sunilg] let me know if 
you have made progress on this already.

> Blacklisting support for scheduling AMs
> ---
>
> Key: YARN-2005
> URL: https://issues.apache.org/jira/browse/YARN-2005
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Jason Lowe
>
> It would be nice if the RM supported blacklisting a node for an AM launch 
> after the same node fails a configurable number of AM attempts.  This would 
> be similar to the blacklisting support for scheduling task attempts in the 
> MapReduce AM but for scheduling AM attempts on the RM side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2005) Blacklisting support for scheduling AMs

2015-05-20 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot reassigned YARN-2005:
---

Assignee: Anubhav Dhoot

> Blacklisting support for scheduling AMs
> ---
>
> Key: YARN-2005
> URL: https://issues.apache.org/jira/browse/YARN-2005
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Jason Lowe
>Assignee: Anubhav Dhoot
>
> It would be nice if the RM supported blacklisting a node for an AM launch 
> after the same node fails a configurable number of AM attempts.  This would 
> be similar to the blacklisting support for scheduling task attempts in the 
> MapReduce AM but for scheduling AM attempts on the RM side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3647) RMWebServices api's should use updated api from CommonNodeLabelsManager to get NodeLabel object

2015-05-20 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552619#comment-14552619
 ] 

Sunil G commented on YARN-3647:
---

Test case failure and findbugs error are not related to this patch.

> RMWebServices api's should use updated api from CommonNodeLabelsManager to 
> get NodeLabel object
> ---
>
> Key: YARN-3647
> URL: https://issues.apache.org/jira/browse/YARN-3647
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-3647.patch, 0002-YARN-3647.patch
>
>
> After YARN-3579, RMWebServices apis can use the updated version of apis in 
> CommonNodeLabelsManager which gives full NodeLabel object instead of creating 
> NodeLabel object from plain label name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3677) Fix findbugs warnings in yarn-server-resourcemanager

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552603#comment-14552603
 ] 

Hudson commented on YARN-3677:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2149 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2149/])
YARN-3677. Fix findbugs warnings in yarn-server-resourcemanager. Contributed by 
Vinod Kumar Vavilapalli. (ozawa: rev 7401e5b5e8060b6b027d714b5ceb641fcfe5b598)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java


> Fix findbugs warnings in yarn-server-resourcemanager
> 
>
> Key: YARN-3677
> URL: https://issues.apache.org/jira/browse/YARN-3677
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Akira AJISAKA
>Assignee: Vinod Kumar Vavilapalli
>Priority: Minor
>  Labels: newbie
> Fix For: 2.7.1
>
> Attachments: YARN-3677-20150519.txt
>
>
> There is 1 findbugs warning in FileSystemRMStateStore.java.
> {noformat}
> Inconsistent synchronization of FileSystemRMStateStore.isHDFS; locked 66% of 
> time
> Unsynchronized access at FileSystemRMStateStore.java: [line 156]
> Field 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS
> Synchronized 66% of the time
> Synchronized access at FileSystemRMStateStore.java: [line 148]
> Synchronized access at FileSystemRMStateStore.java: [line 859]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3583) Support of NodeLabel object instead of plain String in YarnClient side.

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552591#comment-14552591
 ] 

Hudson commented on YARN-3583:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2149 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2149/])
YARN-3583. Support of NodeLabel object instead of plain String in YarnClient 
side. (Sunil G via wangda) (wangda: rev 
563eb1ad2ae848a23bbbf32ebfaf107e8fa14e87)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetNodesToLabelsResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesResponse.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/ReplaceLabelsOnNodeRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetNodesToLabelsResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto


> Support of NodeLabel object instead of plain String in YarnClient side.
> ---
>
> Key: YARN-3583
> URL: https://issues.apache.org/jira/browse/YARN-3583
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client
>Affects Versions: 2.6.0
>Reporter: Sunil G
>Assignee: Sunil G
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3583.patch, 0002-YARN-3583.patch, 
> 0003-YARN-3583.patch, 0004-YARN-3583.patch
>
>
> Similar to YARN-3521, use NodeLabel objects in YarnClient side apis.
> getLabelsToNodes/getNodeToLabels api's can use NodeLabel object instead of 
> using plain label name.
> This will help to bring other label details such as Exclusivity to client 
> side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3302) TestDockerContainerExecutor should run automatically if it can detect docker in the usual place

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552601#comment-14552601
 ] 

Hudson commented on YARN-3302:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2149 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2149/])
YARN-3302. TestDockerContainerExecutor should run automatically if it can 
detect docker in the usual place (Ravindra Kumar Naik via raviprak) (raviprak: 
rev c97f32e7b9d9e1d4c80682cc01741579166174d1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutor.java
* hadoop-yarn-project/CHANGES.txt


> TestDockerContainerExecutor should run automatically if it can detect docker 
> in the usual place
> ---
>
> Key: YARN-3302
> URL: https://issues.apache.org/jira/browse/YARN-3302
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.6.0
>Reporter: Ravi Prakash
>Assignee: Ravindra Kumar Naik
> Attachments: YARN-3302-trunk.001.patch, YARN-3302-trunk.002.patch, 
> YARN-3302-trunk.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552597#comment-14552597
 ] 

Hudson commented on YARN-2821:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2149 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2149/])
YARN-2821. Fixed a problem that DistributedShell AM may hang if restarted. 
Contributed by Varun Vasudev (jianhe: rev 
7438966586f1896ab3e8b067d47a4af28a894106)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDSAppMaster.java


> Distributed shell app master becomes unresponsive sometimes
> ---
>
> Key: YARN-2821
> URL: https://issues.apache.org/jira/browse/YARN-2821
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Affects Versions: 2.5.1
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Fix For: 2.8.0
>
> Attachments: YARN-2821.002.patch, YARN-2821.003.patch, 
> YARN-2821.004.patch, YARN-2821.005.patch, apache-yarn-2821.0.patch, 
> apache-yarn-2821.1.patch
>
>
> We've noticed that once in a while the distributed shell app master becomes 
> unresponsive and is eventually killed by the RM. snippet of the logs -
> {noformat}
> 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: 
> appattempt_1415123350094_0017_01 received 0 previous attempts' running 
> containers on AM registration.
> 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
> container ask: Capability[]Priority[0]
> 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
> container ask: Capability[]Priority[0]
> 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
> container ask: Capability[]Priority[0]
> 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
> container ask: Capability[]Priority[0]
> 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
> container ask: Capability[]Priority[0]
> 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : 
> onprem-tez2:45454
> 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from 
> RM for container ask, allocatedCnt=1
> 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell 
> command on a new container., 
> containerId=container_1415123350094_0017_01_02, 
> containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
> containerResourceMemory1024, containerResourceVirtualCores1
> 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up 
> container launch container for 
> containerid=container_1415123350094_0017_01_02
> 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
> START_CONTAINER for Container container_1415123350094_0017_01_02
> 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
> onprem-tez2:45454
> 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
> QUERY_CONTAINER for Container container_1415123350094_0017_01_02
> 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
> onprem-tez2:45454
> 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : 
> onprem-tez3:45454
> 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : 
> onprem-tez4:45454
> 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from 
> RM for container ask, allocatedCnt=3
> 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
> command on a new container., 
> containerId=container_1415123350094_0017_01_03, 
> containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
> containerResourceMemory1024, containerResourceVirtualCores1
> 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
> command on a new container., 
> containerId=container_1415123350094_0017_01_04, 
> containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, 
> containerResourceMemory1024, containerResourceVirtualCores1
> 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
> command on a new container., 
> containerId=container_1415123350094_0017_01_05, 
> containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, 
> containerResourceMemory1024, containerResourceVirtualCores1
> 14/11/04 18:21:39 INFO distrib

[jira] [Commented] (YARN-3565) NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552596#comment-14552596
 ] 

Hudson commented on YARN-3565:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2149 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2149/])
YARN-3565. NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel 
object instead of String. (Naganarasimha G R via wangda) (wangda: rev 
b37da52a1c4fb3da2bd21bfadc5ec61c5f953a59)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/NodeLabelsProvider.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/NodeLabelTestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerRequest.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java


> NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object 
> instead of String
> -
>
> Key: YARN-3565
> URL: https://issues.apache.org/jira/browse/YARN-3565
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
>Priority: Blocker
> Fix For: 2.8.0
>
> Attachments: YARN-3565-20150502-1.patch, YARN-3565.20150515-1.patch, 
> YARN-3565.20150516-1.patch, YARN-3565.20150519-1.patch
>
>
> Now NM HB/Register uses Set, it will be hard to add new fields if we 
> want to support specifying NodeLabel type such as exclusivity/constraints, 
> etc. We need to make sure rolling upgrade works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3601) Fix UT TestRMFailover.testRMWebAppRedirect

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552592#comment-14552592
 ] 

Hudson commented on YARN-3601:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2149 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2149/])
YARN-3601. Fix UT TestRMFailover.testRMWebAppRedirect. Contributed by Weiwei 
Yang (xgong: rev 5009ad4a7f712fc578b461ecec53f7f97eaaed0c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
* hadoop-yarn-project/CHANGES.txt


> Fix UT TestRMFailover.testRMWebAppRedirect
> --
>
> Key: YARN-3601
> URL: https://issues.apache.org/jira/browse/YARN-3601
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
> Environment: Red Hat Enterprise Linux Workstation release 6.5 
> (Santiago)
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Critical
>  Labels: test
> Fix For: 2.7.1
>
> Attachments: YARN-3601.001.patch
>
>
> This test case was not working since the commit from YARN-2605. It failed 
> with NPE exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3601) Fix UT TestRMFailover.testRMWebAppRedirect

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552549#comment-14552549
 ] 

Hudson commented on YARN-3601:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #201 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/201/])
YARN-3601. Fix UT TestRMFailover.testRMWebAppRedirect. Contributed by Weiwei 
Yang (xgong: rev 5009ad4a7f712fc578b461ecec53f7f97eaaed0c)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java


> Fix UT TestRMFailover.testRMWebAppRedirect
> --
>
> Key: YARN-3601
> URL: https://issues.apache.org/jira/browse/YARN-3601
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
> Environment: Red Hat Enterprise Linux Workstation release 6.5 
> (Santiago)
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Critical
>  Labels: test
> Fix For: 2.7.1
>
> Attachments: YARN-3601.001.patch
>
>
> This test case was not working since the commit from YARN-2605. It failed 
> with NPE exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552554#comment-14552554
 ] 

Hudson commented on YARN-2821:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #201 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/201/])
YARN-2821. Fixed a problem that DistributedShell AM may hang if restarted. 
Contributed by Varun Vasudev (jianhe: rev 
7438966586f1896ab3e8b067d47a4af28a894106)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDSAppMaster.java
* hadoop-yarn-project/CHANGES.txt


> Distributed shell app master becomes unresponsive sometimes
> ---
>
> Key: YARN-2821
> URL: https://issues.apache.org/jira/browse/YARN-2821
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Affects Versions: 2.5.1
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Fix For: 2.8.0
>
> Attachments: YARN-2821.002.patch, YARN-2821.003.patch, 
> YARN-2821.004.patch, YARN-2821.005.patch, apache-yarn-2821.0.patch, 
> apache-yarn-2821.1.patch
>
>
> We've noticed that once in a while the distributed shell app master becomes 
> unresponsive and is eventually killed by the RM. snippet of the logs -
> {noformat}
> 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: 
> appattempt_1415123350094_0017_01 received 0 previous attempts' running 
> containers on AM registration.
> 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
> container ask: Capability[]Priority[0]
> 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
> container ask: Capability[]Priority[0]
> 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
> container ask: Capability[]Priority[0]
> 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
> container ask: Capability[]Priority[0]
> 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
> container ask: Capability[]Priority[0]
> 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : 
> onprem-tez2:45454
> 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from 
> RM for container ask, allocatedCnt=1
> 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell 
> command on a new container., 
> containerId=container_1415123350094_0017_01_02, 
> containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
> containerResourceMemory1024, containerResourceVirtualCores1
> 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up 
> container launch container for 
> containerid=container_1415123350094_0017_01_02
> 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
> START_CONTAINER for Container container_1415123350094_0017_01_02
> 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
> onprem-tez2:45454
> 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
> QUERY_CONTAINER for Container container_1415123350094_0017_01_02
> 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
> onprem-tez2:45454
> 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : 
> onprem-tez3:45454
> 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : 
> onprem-tez4:45454
> 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from 
> RM for container ask, allocatedCnt=3
> 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
> command on a new container., 
> containerId=container_1415123350094_0017_01_03, 
> containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
> containerResourceMemory1024, containerResourceVirtualCores1
> 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
> command on a new container., 
> containerId=container_1415123350094_0017_01_04, 
> containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, 
> containerResourceMemory1024, containerResourceVirtualCores1
> 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
> command on a new container., 
> containerId=container_1415123350094_0017_01_05, 
> containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, 
> containerResourceMemory1024, containerResourceVirtualCores1
> 14/11/04 18:21:39 IN

[jira] [Commented] (YARN-3677) Fix findbugs warnings in yarn-server-resourcemanager

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552559#comment-14552559
 ] 

Hudson commented on YARN-3677:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #201 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/201/])
YARN-3677. Fix findbugs warnings in yarn-server-resourcemanager. Contributed by 
Vinod Kumar Vavilapalli. (ozawa: rev 7401e5b5e8060b6b027d714b5ceb641fcfe5b598)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
* hadoop-yarn-project/CHANGES.txt


> Fix findbugs warnings in yarn-server-resourcemanager
> 
>
> Key: YARN-3677
> URL: https://issues.apache.org/jira/browse/YARN-3677
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Akira AJISAKA
>Assignee: Vinod Kumar Vavilapalli
>Priority: Minor
>  Labels: newbie
> Fix For: 2.7.1
>
> Attachments: YARN-3677-20150519.txt
>
>
> There is 1 findbugs warning in FileSystemRMStateStore.java.
> {noformat}
> Inconsistent synchronization of FileSystemRMStateStore.isHDFS; locked 66% of 
> time
> Unsynchronized access at FileSystemRMStateStore.java: [line 156]
> Field 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS
> Synchronized 66% of the time
> Synchronized access at FileSystemRMStateStore.java: [line 148]
> Synchronized access at FileSystemRMStateStore.java: [line 859]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3583) Support of NodeLabel object instead of plain String in YarnClient side.

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552548#comment-14552548
 ] 

Hudson commented on YARN-3583:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #201 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/201/])
YARN-3583. Support of NodeLabel object instead of plain String in YarnClient 
side. (Sunil G via wangda) (wangda: rev 
563eb1ad2ae848a23bbbf32ebfaf107e8fa14e87)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetNodesToLabelsResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetNodesToLabelsResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/ReplaceLabelsOnNodeRequestPBImpl.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java


> Support of NodeLabel object instead of plain String in YarnClient side.
> ---
>
> Key: YARN-3583
> URL: https://issues.apache.org/jira/browse/YARN-3583
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client
>Affects Versions: 2.6.0
>Reporter: Sunil G
>Assignee: Sunil G
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3583.patch, 0002-YARN-3583.patch, 
> 0003-YARN-3583.patch, 0004-YARN-3583.patch
>
>
> Similar to YARN-3521, use NodeLabel objects in YarnClient side apis.
> getLabelsToNodes/getNodeToLabels api's can use NodeLabel object instead of 
> using plain label name.
> This will help to bring other label details such as Exclusivity to client 
> side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3565) NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552553#comment-14552553
 ] 

Hudson commented on YARN-3565:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #201 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/201/])
YARN-3565. NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel 
object instead of String. (Naganarasimha G R via wangda) (wangda: rev 
b37da52a1c4fb3da2bd21bfadc5ec61c5f953a59)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/NodeLabelTestBase.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/NodeLabelsProvider.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java


> NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object 
> instead of String
> -
>
> Key: YARN-3565
> URL: https://issues.apache.org/jira/browse/YARN-3565
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
>Priority: Blocker
> Fix For: 2.8.0
>
> Attachments: YARN-3565-20150502-1.patch, YARN-3565.20150515-1.patch, 
> YARN-3565.20150516-1.patch, YARN-3565.20150519-1.patch
>
>
> Now NM HB/Register uses Set, it will be hard to add new fields if we 
> want to support specifying NodeLabel type such as exclusivity/constraints, 
> etc. We need to make sure rolling upgrade works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3302) TestDockerContainerExecutor should run automatically if it can detect docker in the usual place

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552557#comment-14552557
 ] 

Hudson commented on YARN-3302:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #201 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/201/])
YARN-3302. TestDockerContainerExecutor should run automatically if it can 
detect docker in the usual place (Ravindra Kumar Naik via raviprak) (raviprak: 
rev c97f32e7b9d9e1d4c80682cc01741579166174d1)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutor.java


> TestDockerContainerExecutor should run automatically if it can detect docker 
> in the usual place
> ---
>
> Key: YARN-3302
> URL: https://issues.apache.org/jira/browse/YARN-3302
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.6.0
>Reporter: Ravi Prakash
>Assignee: Ravindra Kumar Naik
> Attachments: YARN-3302-trunk.001.patch, YARN-3302-trunk.002.patch, 
> YARN-3302-trunk.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-05-20 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552534#comment-14552534
 ] 

Varun Saxena commented on YARN-3051:


Well, I am still stuck on trying to get the attribute set via 
HttpServer2#setAttribute in WebServices class. Will update patch once that is 
done.

> [Storage abstraction] Create backing storage read interface for ATS readers
> ---
>
> Key: YARN-3051
> URL: https://issues.apache.org/jira/browse/YARN-3051
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-3051.wip.02.YARN-2928.patch, YARN-3051.wip.patch, 
> YARN-3051_temp.patch
>
>
> Per design in YARN-2928, create backing storage read interface that can be 
> implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3583) Support of NodeLabel object instead of plain String in YarnClient side.

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552436#comment-14552436
 ] 

Hudson commented on YARN-3583:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #191 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/191/])
YARN-3583. Support of NodeLabel object instead of plain String in YarnClient 
side. (Sunil G via wangda) (wangda: rev 
563eb1ad2ae848a23bbbf32ebfaf107e8fa14e87)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetNodesToLabelsResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/ReplaceLabelsOnNodeRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetNodesToLabelsResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java


> Support of NodeLabel object instead of plain String in YarnClient side.
> ---
>
> Key: YARN-3583
> URL: https://issues.apache.org/jira/browse/YARN-3583
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client
>Affects Versions: 2.6.0
>Reporter: Sunil G
>Assignee: Sunil G
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3583.patch, 0002-YARN-3583.patch, 
> 0003-YARN-3583.patch, 0004-YARN-3583.patch
>
>
> Similar to YARN-3521, use NodeLabel objects in YarnClient side apis.
> getLabelsToNodes/getNodeToLabels api's can use NodeLabel object instead of 
> using plain label name.
> This will help to bring other label details such as Exclusivity to client 
> side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3677) Fix findbugs warnings in yarn-server-resourcemanager

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552447#comment-14552447
 ] 

Hudson commented on YARN-3677:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #191 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/191/])
YARN-3677. Fix findbugs warnings in yarn-server-resourcemanager. Contributed by 
Vinod Kumar Vavilapalli. (ozawa: rev 7401e5b5e8060b6b027d714b5ceb641fcfe5b598)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* hadoop-yarn-project/CHANGES.txt


> Fix findbugs warnings in yarn-server-resourcemanager
> 
>
> Key: YARN-3677
> URL: https://issues.apache.org/jira/browse/YARN-3677
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Akira AJISAKA
>Assignee: Vinod Kumar Vavilapalli
>Priority: Minor
>  Labels: newbie
> Fix For: 2.7.1
>
> Attachments: YARN-3677-20150519.txt
>
>
> There is 1 findbugs warning in FileSystemRMStateStore.java.
> {noformat}
> Inconsistent synchronization of FileSystemRMStateStore.isHDFS; locked 66% of 
> time
> Unsynchronized access at FileSystemRMStateStore.java: [line 156]
> Field 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS
> Synchronized 66% of the time
> Synchronized access at FileSystemRMStateStore.java: [line 148]
> Synchronized access at FileSystemRMStateStore.java: [line 859]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3565) NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552441#comment-14552441
 ] 

Hudson commented on YARN-3565:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #191 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/191/])
YARN-3565. NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel 
object instead of String. (Naganarasimha G R via wangda) (wangda: rev 
b37da52a1c4fb3da2bd21bfadc5ec61c5f953a59)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/NodeLabelsProvider.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/NodeLabelTestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatRequestPBImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto


> NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object 
> instead of String
> -
>
> Key: YARN-3565
> URL: https://issues.apache.org/jira/browse/YARN-3565
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
>Priority: Blocker
> Fix For: 2.8.0
>
> Attachments: YARN-3565-20150502-1.patch, YARN-3565.20150515-1.patch, 
> YARN-3565.20150516-1.patch, YARN-3565.20150519-1.patch
>
>
> Now NM HB/Register uses Set, it will be hard to add new fields if we 
> want to support specifying NodeLabel type such as exclusivity/constraints, 
> etc. We need to make sure rolling upgrade works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3344) procfs stat file is not in the expected format warning

2015-05-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552431#comment-14552431
 ] 

Hadoop QA commented on YARN-3344:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m  2s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 41s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  1s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 24s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 23s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 57s | Tests passed in 
hadoop-yarn-common. |
| | |  39m  2s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12734126/YARN-3344-trunk.005.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 4aa730c |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8024/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8024/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8024/console |


This message was automatically generated.

> procfs stat file is not in the expected format warning
> --
>
> Key: YARN-3344
> URL: https://issues.apache.org/jira/browse/YARN-3344
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jon Bringhurst
>Assignee: Ravindra Kumar Naik
> Attachments: YARN-3344-trunk.005.patch
>
>
> Although this doesn't appear to be causing any functional issues, it is 
> spamming our log files quite a bit. :)
> It appears that the regex in ProcfsBasedProcessTree doesn't work for all 
> /proc//stat files.
> Here's the error I'm seeing:
> {noformat}
> "source_host": "asdf",
> "method": "constructProcessInfo",
> "level": "WARN",
> "message": "Unexpected: procfs stat file is not in the expected format 
> for process with pid 6953"
> "file": "ProcfsBasedProcessTree.java",
> "line_number": "514",
> "class": "org.apache.hadoop.yarn.util.ProcfsBasedProcessTree",
> {noformat}
> And here's the basic info on process with pid 6953:
> {noformat}
> [asdf ~]$ cat /proc/6953/stat
> 6953 (python2.6 /expo) S 1871 1871 1871 0 -1 4202496 9364 1080 0 0 25 3 0 0 
> 20 0 1 0 144918696 205295616 5856 18446744073709551615 1 1 0 0 0 0 0 16781312 
> 2 18446744073709551615 0 0 17 13 0 0 0 0 0
> [asdf ~]$ ps aux|grep 6953
> root  6953  0.0  0.0 200484 23424 ?S21:44   0:00 python2.6 
> /export/apps/salt/minion-scripts/module-sync.py
> jbringhu 13481  0.0  0.0 105312   872 pts/0S+   22:13   0:00 grep -i 6953
> [asdf ~]$ 
> {noformat}
> This is using 2.6.32-431.11.2.el6.x86_64 in RHEL 6.5.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3302) TestDockerContainerExecutor should run automatically if it can detect docker in the usual place

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552445#comment-14552445
 ] 

Hudson commented on YARN-3302:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #191 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/191/])
YARN-3302. TestDockerContainerExecutor should run automatically if it can 
detect docker in the usual place (Ravindra Kumar Naik via raviprak) (raviprak: 
rev c97f32e7b9d9e1d4c80682cc01741579166174d1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutor.java
* hadoop-yarn-project/CHANGES.txt


> TestDockerContainerExecutor should run automatically if it can detect docker 
> in the usual place
> ---
>
> Key: YARN-3302
> URL: https://issues.apache.org/jira/browse/YARN-3302
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.6.0
>Reporter: Ravi Prakash
>Assignee: Ravindra Kumar Naik
> Attachments: YARN-3302-trunk.001.patch, YARN-3302-trunk.002.patch, 
> YARN-3302-trunk.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3601) Fix UT TestRMFailover.testRMWebAppRedirect

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552437#comment-14552437
 ] 

Hudson commented on YARN-3601:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #191 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/191/])
YARN-3601. Fix UT TestRMFailover.testRMWebAppRedirect. Contributed by Weiwei 
Yang (xgong: rev 5009ad4a7f712fc578b461ecec53f7f97eaaed0c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
* hadoop-yarn-project/CHANGES.txt


> Fix UT TestRMFailover.testRMWebAppRedirect
> --
>
> Key: YARN-3601
> URL: https://issues.apache.org/jira/browse/YARN-3601
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
> Environment: Red Hat Enterprise Linux Workstation release 6.5 
> (Santiago)
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Critical
>  Labels: test
> Fix For: 2.7.1
>
> Attachments: YARN-3601.001.patch
>
>
> This test case was not working since the commit from YARN-2605. It failed 
> with NPE exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552442#comment-14552442
 ] 

Hudson commented on YARN-2821:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #191 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/191/])
YARN-2821. Fixed a problem that DistributedShell AM may hang if restarted. 
Contributed by Varun Vasudev (jianhe: rev 
7438966586f1896ab3e8b067d47a4af28a894106)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDSAppMaster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java


> Distributed shell app master becomes unresponsive sometimes
> ---
>
> Key: YARN-2821
> URL: https://issues.apache.org/jira/browse/YARN-2821
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Affects Versions: 2.5.1
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Fix For: 2.8.0
>
> Attachments: YARN-2821.002.patch, YARN-2821.003.patch, 
> YARN-2821.004.patch, YARN-2821.005.patch, apache-yarn-2821.0.patch, 
> apache-yarn-2821.1.patch
>
>
> We've noticed that once in a while the distributed shell app master becomes 
> unresponsive and is eventually killed by the RM. snippet of the logs -
> {noformat}
> 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: 
> appattempt_1415123350094_0017_01 received 0 previous attempts' running 
> containers on AM registration.
> 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
> container ask: Capability[]Priority[0]
> 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
> container ask: Capability[]Priority[0]
> 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
> container ask: Capability[]Priority[0]
> 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
> container ask: Capability[]Priority[0]
> 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
> container ask: Capability[]Priority[0]
> 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : 
> onprem-tez2:45454
> 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from 
> RM for container ask, allocatedCnt=1
> 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell 
> command on a new container., 
> containerId=container_1415123350094_0017_01_02, 
> containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
> containerResourceMemory1024, containerResourceVirtualCores1
> 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up 
> container launch container for 
> containerid=container_1415123350094_0017_01_02
> 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
> START_CONTAINER for Container container_1415123350094_0017_01_02
> 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
> onprem-tez2:45454
> 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
> QUERY_CONTAINER for Container container_1415123350094_0017_01_02
> 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
> onprem-tez2:45454
> 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : 
> onprem-tez3:45454
> 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : 
> onprem-tez4:45454
> 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from 
> RM for container ask, allocatedCnt=3
> 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
> command on a new container., 
> containerId=container_1415123350094_0017_01_03, 
> containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
> containerResourceMemory1024, containerResourceVirtualCores1
> 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
> command on a new container., 
> containerId=container_1415123350094_0017_01_04, 
> containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, 
> containerResourceMemory1024, containerResourceVirtualCores1
> 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
> command on a new container., 
> containerId=container_1415123350094_0017_01_05, 
> containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, 
> containerResourceMemory1024, containerResourceVirtualCores1
> 14/11/04 18:21:39 INFO distrib

[jira] [Commented] (YARN-314) Schedulers should allow resource requests of different sizes at the same priority and location

2015-05-20 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552416#comment-14552416
 ] 

Karthik Kambatla commented on YARN-314:
---

Discussed this with [~asuresh] offline. We were wondering if AppSchedulingInfo 
should be supplemented (or replaced) by another singleton data structure that 
captures pending requests and maintains multiple maps - to index these requests 
by both apps and nodes/racks. We should of course add other convenience methods 
to add/remove or query these requests. 

> Schedulers should allow resource requests of different sizes at the same 
> priority and location
> --
>
> Key: YARN-314
> URL: https://issues.apache.org/jira/browse/YARN-314
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 2.0.2-alpha
>Reporter: Sandy Ryza
> Attachments: yarn-314-prelim.patch
>
>
> Currently, resource requests for the same container and locality are expected 
> to all be the same size.
> While it it doesn't look like it's needed for apps currently, and can be 
> circumvented by specifying different priorities if absolutely necessary, it 
> seems to me that the ability to request containers with different resource 
> requirements at the same priority level should be there for the future and 
> for completeness sake.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever

2015-05-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552404#comment-14552404
 ] 

Hadoop QA commented on YARN-3646:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 34s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 32s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 37s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 38s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m  6s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   6m 51s | Tests passed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   1m 55s | Tests passed in 
hadoop-yarn-common. |
| | |  45m 47s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12734115/YARN-3646.002.patch |
| Optional Tests | javac unit findbugs checkstyle javadoc |
| git revision | trunk / 4aa730c |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8023/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8023/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8023/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8023/console |


This message was automatically generated.

> Applications are getting stuck some times in case of retry policy forever
> -
>
> Key: YARN-3646
> URL: https://issues.apache.org/jira/browse/YARN-3646
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Reporter: Raju Bairishetti
> Attachments: YARN-3646.001.patch, YARN-3646.002.patch, YARN-3646.patch
>
>
> We have set  *yarn.resourcemanager.connect.wait-ms* to -1 to use  FOREVER 
> retry policy.
> Yarn client is infinitely retrying in case of exceptions from the RM as it is 
> using retrying policy as FOREVER. The problem is it is retrying for all kinds 
> of exceptions (like ApplicationNotFoundException), even though it is not a 
> connection failure. Due to this my application is not progressing further.
> *Yarn client should not retry infinitely in case of non connection failures.*
> We have written a simple yarn-client which is trying to get an application 
> report for an invalid  or older appId. ResourceManager is throwing an 
> ApplicationNotFoundException as this is an invalid or older appId.  But 
> because of retry policy FOREVER, client is keep on retrying for getting the 
> application report and ResourceManager is throwing 
> ApplicationNotFoundException continuously.
> {code}
> private void testYarnClientRetryPolicy() throws  Exception{
> YarnConfiguration conf = new YarnConfiguration();
> conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, 
> -1);
> YarnClient yarnClient = YarnClient.createYarnClient();
> yarnClient.init(conf);
> yarnClient.start();
> ApplicationId appId = ApplicationId.newInstance(1430126768987L, 
> 10645);
> ApplicationReport report = yarnClient.getApplicationReport(appId);
> }
> {code}
> *RM logs:*
> {noformat}
> 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
> from 10.14.120.231:61621 Call#875162 Retry#0
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1430126768987_10645' doesn't exist in RM.
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.servic

[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552392#comment-14552392
 ] 

Hudson commented on YARN-2821:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2131 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2131/])
YARN-2821. Fixed a problem that DistributedShell AM may hang if restarted. 
Contributed by Varun Vasudev (jianhe: rev 
7438966586f1896ab3e8b067d47a4af28a894106)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDSAppMaster.java


> Distributed shell app master becomes unresponsive sometimes
> ---
>
> Key: YARN-2821
> URL: https://issues.apache.org/jira/browse/YARN-2821
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Affects Versions: 2.5.1
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Fix For: 2.8.0
>
> Attachments: YARN-2821.002.patch, YARN-2821.003.patch, 
> YARN-2821.004.patch, YARN-2821.005.patch, apache-yarn-2821.0.patch, 
> apache-yarn-2821.1.patch
>
>
> We've noticed that once in a while the distributed shell app master becomes 
> unresponsive and is eventually killed by the RM. snippet of the logs -
> {noformat}
> 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: 
> appattempt_1415123350094_0017_01 received 0 previous attempts' running 
> containers on AM registration.
> 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
> container ask: Capability[]Priority[0]
> 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
> container ask: Capability[]Priority[0]
> 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
> container ask: Capability[]Priority[0]
> 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
> container ask: Capability[]Priority[0]
> 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested 
> container ask: Capability[]Priority[0]
> 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : 
> onprem-tez2:45454
> 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from 
> RM for container ask, allocatedCnt=1
> 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell 
> command on a new container., 
> containerId=container_1415123350094_0017_01_02, 
> containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
> containerResourceMemory1024, containerResourceVirtualCores1
> 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up 
> container launch container for 
> containerid=container_1415123350094_0017_01_02
> 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
> START_CONTAINER for Container container_1415123350094_0017_01_02
> 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
> onprem-tez2:45454
> 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
> QUERY_CONTAINER for Container container_1415123350094_0017_01_02
> 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
> onprem-tez2:45454
> 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : 
> onprem-tez3:45454
> 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : 
> onprem-tez4:45454
> 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from 
> RM for container ask, allocatedCnt=3
> 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
> command on a new container., 
> containerId=container_1415123350094_0017_01_03, 
> containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, 
> containerResourceMemory1024, containerResourceVirtualCores1
> 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
> command on a new container., 
> containerId=container_1415123350094_0017_01_04, 
> containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, 
> containerResourceMemory1024, containerResourceVirtualCores1
> 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell 
> command on a new container., 
> containerId=container_1415123350094_0017_01_05, 
> containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, 
> containerResourceMemory1024, containerResourceVirtualCores1
> 14/11/04 18:21:39 INFO distributedshell.

[jira] [Commented] (YARN-3583) Support of NodeLabel object instead of plain String in YarnClient side.

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552386#comment-14552386
 ] 

Hudson commented on YARN-3583:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2131 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2131/])
YARN-3583. Support of NodeLabel object instead of plain String in YarnClient 
side. (Sunil G via wangda) (wangda: rev 
563eb1ad2ae848a23bbbf32ebfaf107e8fa14e87)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/ReplaceLabelsOnNodeRequestPBImpl.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetNodesToLabelsResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetNodesToLabelsResponse.java


> Support of NodeLabel object instead of plain String in YarnClient side.
> ---
>
> Key: YARN-3583
> URL: https://issues.apache.org/jira/browse/YARN-3583
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client
>Affects Versions: 2.6.0
>Reporter: Sunil G
>Assignee: Sunil G
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3583.patch, 0002-YARN-3583.patch, 
> 0003-YARN-3583.patch, 0004-YARN-3583.patch
>
>
> Similar to YARN-3521, use NodeLabel objects in YarnClient side apis.
> getLabelsToNodes/getNodeToLabels api's can use NodeLabel object instead of 
> using plain label name.
> This will help to bring other label details such as Exclusivity to client 
> side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3302) TestDockerContainerExecutor should run automatically if it can detect docker in the usual place

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552395#comment-14552395
 ] 

Hudson commented on YARN-3302:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2131 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2131/])
YARN-3302. TestDockerContainerExecutor should run automatically if it can 
detect docker in the usual place (Ravindra Kumar Naik via raviprak) (raviprak: 
rev c97f32e7b9d9e1d4c80682cc01741579166174d1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutor.java
* hadoop-yarn-project/CHANGES.txt


> TestDockerContainerExecutor should run automatically if it can detect docker 
> in the usual place
> ---
>
> Key: YARN-3302
> URL: https://issues.apache.org/jira/browse/YARN-3302
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.6.0
>Reporter: Ravi Prakash
>Assignee: Ravindra Kumar Naik
> Attachments: YARN-3302-trunk.001.patch, YARN-3302-trunk.002.patch, 
> YARN-3302-trunk.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3677) Fix findbugs warnings in yarn-server-resourcemanager

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552397#comment-14552397
 ] 

Hudson commented on YARN-3677:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2131 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2131/])
YARN-3677. Fix findbugs warnings in yarn-server-resourcemanager. Contributed by 
Vinod Kumar Vavilapalli. (ozawa: rev 7401e5b5e8060b6b027d714b5ceb641fcfe5b598)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* hadoop-yarn-project/CHANGES.txt


> Fix findbugs warnings in yarn-server-resourcemanager
> 
>
> Key: YARN-3677
> URL: https://issues.apache.org/jira/browse/YARN-3677
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Akira AJISAKA
>Assignee: Vinod Kumar Vavilapalli
>Priority: Minor
>  Labels: newbie
> Fix For: 2.7.1
>
> Attachments: YARN-3677-20150519.txt
>
>
> There is 1 findbugs warning in FileSystemRMStateStore.java.
> {noformat}
> Inconsistent synchronization of FileSystemRMStateStore.isHDFS; locked 66% of 
> time
> Unsynchronized access at FileSystemRMStateStore.java: [line 156]
> Field 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS
> Synchronized 66% of the time
> Synchronized access at FileSystemRMStateStore.java: [line 148]
> Synchronized access at FileSystemRMStateStore.java: [line 859]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3565) NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552391#comment-14552391
 ] 

Hudson commented on YARN-3565:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2131 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2131/])
YARN-3565. NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel 
object instead of String. (Naganarasimha G R via wangda) (wangda: rev 
b37da52a1c4fb3da2bd21bfadc5ec61c5f953a59)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/NodeLabelTestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/NodeLabelsProvider.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerRequest.java


> NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object 
> instead of String
> -
>
> Key: YARN-3565
> URL: https://issues.apache.org/jira/browse/YARN-3565
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
>Priority: Blocker
> Fix For: 2.8.0
>
> Attachments: YARN-3565-20150502-1.patch, YARN-3565.20150515-1.patch, 
> YARN-3565.20150516-1.patch, YARN-3565.20150519-1.patch
>
>
> Now NM HB/Register uses Set, it will be hard to add new fields if we 
> want to support specifying NodeLabel type such as exclusivity/constraints, 
> etc. We need to make sure rolling upgrade works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3601) Fix UT TestRMFailover.testRMWebAppRedirect

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552387#comment-14552387
 ] 

Hudson commented on YARN-3601:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2131 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2131/])
YARN-3601. Fix UT TestRMFailover.testRMWebAppRedirect. Contributed by Weiwei 
Yang (xgong: rev 5009ad4a7f712fc578b461ecec53f7f97eaaed0c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
* hadoop-yarn-project/CHANGES.txt


> Fix UT TestRMFailover.testRMWebAppRedirect
> --
>
> Key: YARN-3601
> URL: https://issues.apache.org/jira/browse/YARN-3601
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
> Environment: Red Hat Enterprise Linux Workstation release 6.5 
> (Santiago)
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Critical
>  Labels: test
> Fix For: 2.7.1
>
> Attachments: YARN-3601.001.patch
>
>
> This test case was not working since the commit from YARN-2605. It failed 
> with NPE exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability

2015-05-20 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552375#comment-14552375
 ] 

MENG DING commented on YARN-1902:
-

I have been experimenting with the idea of changing AppSchedulingInfo to 
maintain a total request table, a fulfilled allocation table, and then 
calculate the difference of the two tables as the real outstanding request 
table used for scheduling. All is fine until I realized that this cannot handle 
one use case where a AMRMClient, right before sending the allocation heartbeat, 
removes all container requests, and add new container requests at the same 
priority and location (possibly with different resource capability).  
AppSchedulingInfo does not know about this, and may not treat the newly added 
container requests as outstanding requests.

I agree that currently I do not see a clean solution without affecting backward 
compatibility. 

> Allocation of too many containers when a second request is done with the same 
> resource capability
> -
>
> Key: YARN-1902
> URL: https://issues.apache.org/jira/browse/YARN-1902
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.2.0, 2.3.0, 2.4.0
>Reporter: Sietse T. Au
>Assignee: Sietse T. Au
>  Labels: client
> Attachments: YARN-1902.patch, YARN-1902.v2.patch, YARN-1902.v3.patch
>
>
> Regarding AMRMClientImpl
> Scenario 1:
> Given a ContainerRequest x with Resource y, when addContainerRequest is 
> called z times with x, allocate is called and at least one of the z allocated 
> containers is started, then if another addContainerRequest call is done and 
> subsequently an allocate call to the RM, (z+1) containers will be allocated, 
> where 1 container is expected.
> Scenario 2:
> No containers are started between the allocate calls. 
> Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) 
> are requested in both scenarios, but that only in the second scenario, the 
> correct behavior is observed.
> Looking at the implementation I have found that this (z+1) request is caused 
> by the structure of the remoteRequestsTable. The consequence of Map ResourceRequestInfo> is that ResourceRequestInfo does not hold any 
> information about whether a request has been sent to the RM yet or not.
> There are workarounds for this, such as releasing the excess containers 
> received.
> The solution implemented is to initialize a new ResourceRequest in 
> ResourceRequestInfo when a request has been successfully sent to the RM.
> The patch includes a test in which scenario one is tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >