[jira] [Commented] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation
[ https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553708#comment-14553708 ] Hadoop QA commented on YARN-3655: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 58s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 52s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 47s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 48s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 16s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 50m 22s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 87m 39s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734328/YARN-3655.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / fb6b38d | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8037/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8037/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8037/console | This message was automatically generated. > FairScheduler: potential livelock due to maxAMShare limitation and container > reservation > - > > Key: YARN-3655 > URL: https://issues.apache.org/jira/browse/YARN-3655 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-3655.000.patch, YARN-3655.001.patch, > YARN-3655.002.patch > > > FairScheduler: potential livelock due to maxAMShare limitation and container > reservation. > If a node is reserved by an application, all the other applications don't > have any chance to assign a new container on this node, unless the > application which reserves the node assigns a new container on this node or > releases the reserved container on this node. > The problem is if an application tries to call assignReservedContainer and > fail to get a new container due to maxAMShare limitation, it will block all > other applications to use the nodes it reserves. If all other running > applications can't release their AM containers due to being blocked by these > reserved containers. A livelock situation can happen. > The following is the code at FSAppAttempt#assignContainer which can cause > this potential livelock. > {code} > // Check the AM resource usage for the leaf queue > if (!isAmRunning() && !getUnmanagedAM()) { > List ask = appSchedulingInfo.getAllResourceRequests(); > if (ask.isEmpty() || !getQueue().canRunAppAM( > ask.get(0).getCapability())) { > if (LOG.isDebugEnabled()) { > LOG.debug("Skipping allocation because maxAMShare limit would " + > "be exceeded"); > } > return Resources.none(); > } > } > {code} > To fix this issue, we can unreserve the node if we can't allocate the AM > container on the node due to Max AM share limitation and the node is reserved > by the application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1012) NM should report resource utilization of running containers to RM in heartbeat
[ https://issues.apache.org/jira/browse/YARN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-1012: -- Attachment: YARN-1012-4.patch Added missing files. Fixed some of the comments. > NM should report resource utilization of running containers to RM in heartbeat > -- > > Key: YARN-1012 > URL: https://issues.apache.org/jira/browse/YARN-1012 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Arun C Murthy >Assignee: Inigo Goiri > Attachments: YARN-1012-1.patch, YARN-1012-2.patch, YARN-1012-3.patch, > YARN-1012-4.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1012) NM should report resource utilization of running containers to RM in heartbeat
[ https://issues.apache.org/jira/browse/YARN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553688#comment-14553688 ] Inigo Goiri commented on YARN-1012: --- I don't know how I missed the missing files... I've been checking this for days. Fixed now. Agreed and fixed 1, 2, 3, and 4. I don't know what to do with 5... your call. > NM should report resource utilization of running containers to RM in heartbeat > -- > > Key: YARN-1012 > URL: https://issues.apache.org/jira/browse/YARN-1012 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Arun C Murthy >Assignee: Inigo Goiri > Attachments: YARN-1012-1.patch, YARN-1012-2.patch, YARN-1012-3.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1012) NM should report resource utilization of running containers to RM in heartbeat
[ https://issues.apache.org/jira/browse/YARN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553646#comment-14553646 ] Karthik Kambatla commented on YARN-1012: Looks like the patch is missing ResourceUtilizationPBImpl, and hence doesn't build. Could you please include those new files as well? Comments on the patch itself: # Given this is all new code, let us hold off on exposing it to end users just yet. Can we mark ContainerStatus#getUtilization Public-Unstable? # Is there a reason folks would want to turn off tracking utilization? If not, let us get rid of the config and always track it? # When logging at debug level, we want to check if debug logging is enabled to avoid string creation and concat. # I notice that we are using a float for virtual_cores. Do we anticipate using this value in any calculations? If yes, should we change this to be millivcores and int instead to avoid those floating point operations. Given this is just tracking utilization, I suspect we ll do any calculations. # In ContainerMonitorsImpl, we save utilization and then set container metrics. Should we leave this as is? Or, link them up so that the ContainerMonitorsImpl is aware of only one of them? > NM should report resource utilization of running containers to RM in heartbeat > -- > > Key: YARN-1012 > URL: https://issues.apache.org/jira/browse/YARN-1012 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Arun C Murthy >Assignee: Inigo Goiri > Attachments: YARN-1012-1.patch, YARN-1012-2.patch, YARN-1012-3.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs
[ https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553643#comment-14553643 ] Sunil G commented on YARN-2005: --- Hi [~adhoot] I have started working on this a lil bit earlier, and made analysis on same. Please feel free to start working, and I could help you with the reviews. If any other sub-parts work is needed for finishing same, please let me know, I could give u a hand. Thank you. > Blacklisting support for scheduling AMs > --- > > Key: YARN-2005 > URL: https://issues.apache.org/jira/browse/YARN-2005 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 0.23.10, 2.4.0 >Reporter: Jason Lowe >Assignee: Anubhav Dhoot > > It would be nice if the RM supported blacklisting a node for an AM launch > after the same node fails a configurable number of AM attempts. This would > be similar to the blacklisting support for scheduling task attempts in the > MapReduce AM but for scheduling AM attempts on the RM side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation
[ https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553636#comment-14553636 ] zhihai xu commented on YARN-3655: - thanks [~asuresh] for the review. I think the flip-flop won't happen. bq. At some time T2, the next allocation event (after all nodes have sent heartbeat.. or after a continuousScheduling attempt) happens, a reservation of 2GB is made on each node for appX. The above reservation won't succeed because maxAMShare limitation. If it succeeded, then the reservation for appX won't be removed. thanks [~kasha] for your review. these are great suggestions. I made the change based on your suggestions. Also I fixed fitsInMaxShare issue in this JIRA instead of creating a follow-up JIRA. I also did some optimizations to remove some duplicate logic. I find hasContainerForNode already covered getTotalRequiredResources. If we check hasContainerForNode, then we don't check getTotalRequiredResources. So I remove getTotalRequiredResources check in assignReservedContainer and assignContainer. Also because okToUnreserve checked hasContainerForNode, we don't need to check it again for reserved container in assignContainer. I uploaded a new patch YARN-3655.002.patch with above change. > FairScheduler: potential livelock due to maxAMShare limitation and container > reservation > - > > Key: YARN-3655 > URL: https://issues.apache.org/jira/browse/YARN-3655 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-3655.000.patch, YARN-3655.001.patch, > YARN-3655.002.patch > > > FairScheduler: potential livelock due to maxAMShare limitation and container > reservation. > If a node is reserved by an application, all the other applications don't > have any chance to assign a new container on this node, unless the > application which reserves the node assigns a new container on this node or > releases the reserved container on this node. > The problem is if an application tries to call assignReservedContainer and > fail to get a new container due to maxAMShare limitation, it will block all > other applications to use the nodes it reserves. If all other running > applications can't release their AM containers due to being blocked by these > reserved containers. A livelock situation can happen. > The following is the code at FSAppAttempt#assignContainer which can cause > this potential livelock. > {code} > // Check the AM resource usage for the leaf queue > if (!isAmRunning() && !getUnmanagedAM()) { > List ask = appSchedulingInfo.getAllResourceRequests(); > if (ask.isEmpty() || !getQueue().canRunAppAM( > ask.get(0).getCapability())) { > if (LOG.isDebugEnabled()) { > LOG.debug("Skipping allocation because maxAMShare limit would " + > "be exceeded"); > } > return Resources.none(); > } > } > {code} > To fix this issue, we can unreserve the node if we can't allocate the AM > container on the node due to Max AM share limitation and the node is reserved > by the application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation
[ https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3655: Attachment: YARN-3655.002.patch > FairScheduler: potential livelock due to maxAMShare limitation and container > reservation > - > > Key: YARN-3655 > URL: https://issues.apache.org/jira/browse/YARN-3655 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-3655.000.patch, YARN-3655.001.patch, > YARN-3655.002.patch > > > FairScheduler: potential livelock due to maxAMShare limitation and container > reservation. > If a node is reserved by an application, all the other applications don't > have any chance to assign a new container on this node, unless the > application which reserves the node assigns a new container on this node or > releases the reserved container on this node. > The problem is if an application tries to call assignReservedContainer and > fail to get a new container due to maxAMShare limitation, it will block all > other applications to use the nodes it reserves. If all other running > applications can't release their AM containers due to being blocked by these > reserved containers. A livelock situation can happen. > The following is the code at FSAppAttempt#assignContainer which can cause > this potential livelock. > {code} > // Check the AM resource usage for the leaf queue > if (!isAmRunning() && !getUnmanagedAM()) { > List ask = appSchedulingInfo.getAllResourceRequests(); > if (ask.isEmpty() || !getQueue().canRunAppAM( > ask.get(0).getCapability())) { > if (LOG.isDebugEnabled()) { > LOG.debug("Skipping allocation because maxAMShare limit would " + > "be exceeded"); > } > return Resources.none(); > } > } > {code} > To fix this issue, we can unreserve the node if we can't allocate the AM > container on the node due to Max AM share limitation and the node is reserved > by the application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3645) ResourceManager can't start success if attribute value of "aclSubmitApps" is null in fair-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553598#comment-14553598 ] Hadoop QA commented on YARN-3645: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 43s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 33s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 41s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 14s | The applied patch generated 5 new checkstyle issues (total was 27, now 28). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 38s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 16s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 50m 8s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 87m 14s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734306/YARN-3645.1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 6329bd0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8036/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8036/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8036/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8036/console | This message was automatically generated. > ResourceManager can't start success if attribute value of "aclSubmitApps" is > null in fair-scheduler.xml > > > Key: YARN-3645 > URL: https://issues.apache.org/jira/browse/YARN-3645 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.5.2 >Reporter: zhoulinlin > Attachments: YARN-3645.1.patch, YARN-3645.patch > > > The "aclSubmitApps" is configured in fair-scheduler.xml like below: > > > > The resourcemanager log: > 2015-05-14 12:59:48,623 INFO org.apache.hadoop.service.AbstractService: > Service ResourceManager failed in state INITED; cause: > org.apache.hadoop.service.ServiceStateException: java.io.IOException: Failed > to initialize FairScheduler > org.apache.hadoop.service.ServiceStateException: java.io.IOException: Failed > to initialize FairScheduler > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:493) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:920) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:240) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1159) > Caused by: java.io.IOException: Failed to initialize FairScheduler > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1301) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1318) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 7 more > Caused by: java.lang.NullPointerException > a
[jira] [Assigned] (YARN-3692) Allow REST API to set a user generated message when killing an application
[ https://issues.apache.org/jira/browse/YARN-3692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith reassigned YARN-3692: Assignee: Rohith > Allow REST API to set a user generated message when killing an application > -- > > Key: YARN-3692 > URL: https://issues.apache.org/jira/browse/YARN-3692 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Rajat Jain >Assignee: Rohith > > Currently YARN's REST API supports killing an application without setting a > diagnostic message. It would be good to provide that support. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3614) FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash
[ https://issues.apache.org/jira/browse/YARN-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lachisis updated YARN-3614: --- Attachment: YARN-3614-1.patch > FileSystemRMStateStore throw exception when failed to remove application, > that cause resourcemanager to crash > - > > Key: YARN-3614 > URL: https://issues.apache.org/jira/browse/YARN-3614 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0, 2.7.0 >Reporter: lachisis >Priority: Critical > Labels: patch > Fix For: 2.7.1 > > Attachments: YARN-3614-1.patch > > > FileSystemRMStateStore is only a accessorial plug-in of rmstore. > When it failed to remove application, I think warning is enough, but now > resourcemanager crashed. > Recently, I configure > "yarn.resourcemanager.state-store.max-completed-applications" to limit > applications number in rmstore. when applications number exceed the limit, > some old applications will be removed. If failed to remove, resourcemanager > will crash. > The following is log: > 2015-05-11 06:58:43,815 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Removing > info for app: application_1430994493305_0053 > 2015-05-11 06:58:43,815 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: > Removing info for app: application_1430994493305_0053 at: > /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053 > 2015-05-11 06:58:43,816 ERROR > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error > removing app: application_1430994493305_0053 > java.lang.Exception: Failed to delete > /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053 > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:879) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:874) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > 2015-05-11 06:58:43,819 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a > org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type > STATE_STORE_OP_FAILED. Cause: > java.lang.Exception: Failed to delete > /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053 > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMa
[jira] [Updated] (YARN-3645) ResourceManager can't start success if attribute value of "aclSubmitApps" is null in fair-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Liptak updated YARN-3645: --- Attachment: YARN-3645.1.patch > ResourceManager can't start success if attribute value of "aclSubmitApps" is > null in fair-scheduler.xml > > > Key: YARN-3645 > URL: https://issues.apache.org/jira/browse/YARN-3645 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.5.2 >Reporter: zhoulinlin > Attachments: YARN-3645.1.patch, YARN-3645.patch > > > The "aclSubmitApps" is configured in fair-scheduler.xml like below: > > > > The resourcemanager log: > 2015-05-14 12:59:48,623 INFO org.apache.hadoop.service.AbstractService: > Service ResourceManager failed in state INITED; cause: > org.apache.hadoop.service.ServiceStateException: java.io.IOException: Failed > to initialize FairScheduler > org.apache.hadoop.service.ServiceStateException: java.io.IOException: Failed > to initialize FairScheduler > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:493) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:920) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:240) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1159) > Caused by: java.io.IOException: Failed to initialize FairScheduler > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1301) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1318) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 7 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:458) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:337) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1299) > ... 9 more > 2015-05-14 12:59:48,623 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning > to standby state > 2015-05-14 12:59:48,623 INFO > com.zte.zdh.platformplugin.factory.YarnPlatformPluginProxyFactory: plugin > transitionToStandbyIn > 2015-05-14 12:59:48,623 WARN org.apache.hadoop.service.AbstractService: When > stopping the service ResourceManager : java.lang.NullPointerException > java.lang.NullPointerException > at > com.zte.zdh.platformplugin.factory.YarnPlatformPluginProxyFactory.transitionToStandbyIn(YarnPlatformPluginProxyFactory.java:71) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToStandby(ResourceManager.java:997) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStop(ResourceManager.java:1058) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:171) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1159) > 2015-05-14 12:59:48,623 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting > ResourceManager > org.apache.hadoop.service.ServiceStateException: java.io.IOException: Failed > to initialize FairScheduler > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.resourcemanager.
[jira] [Commented] (YARN-3675) FairScheduler: RM quits when node removal races with continousscheduling on the same node
[ https://issues.apache.org/jira/browse/YARN-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553482#comment-14553482 ] Anubhav Dhoot commented on YARN-3675: - Failure does not repro locally for me and seems unrelated > FairScheduler: RM quits when node removal races with continousscheduling on > the same node > - > > Key: YARN-3675 > URL: https://issues.apache.org/jira/browse/YARN-3675 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3675.001.patch, YARN-3675.002.patch, > YARN-3675.003.patch > > > With continuous scheduling, scheduling can be done on a node thats just > removed causing errors like below. > {noformat} > 12:28:53.782 AM FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager > Error in handling event type APP_ATTEMPT_REMOVED to the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.unreserve(FSAppAttempt.java:469) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:815) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:763) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1217) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684) > at java.lang.Thread.run(Thread.java:745) > 12:28:53.783 AMINFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Exiting, bbye.. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3480) Recovery may get very slow with lots of services with lots of app-attempts
[ https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553451#comment-14553451 ] Jun Gong commented on YARN-3480: {quote} Without doing this, we will unnecessarily be forcing apps to lose history simply because the platform cannot recover quickly enough. Thinking more, how about we only have (limits + asynchronous recovery) for services, once YARN-1039 goes in? Non-service apps anyways are not expected to have a lot of app-attempts. {quote} It is reasonable. I will update the patch once YARN-1039 goes in. > Recovery may get very slow with lots of services with lots of app-attempts > -- > > Key: YARN-3480 > URL: https://issues.apache.org/jira/browse/YARN-3480 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Jun Gong >Assignee: Jun Gong > Attachments: YARN-3480.01.patch, YARN-3480.02.patch, > YARN-3480.03.patch, YARN-3480.04.patch > > > When RM HA is enabled and running containers are kept across attempts, apps > are more likely to finish successfully with more retries(attempts), so it > will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However > it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make > RM recover process much slower. It might be better to set max attempts to be > stored in RMStateStore. > BTW: When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to > a small value, retried attempts might be very large. So we need to delete > some attempts stored in RMStateStore and RMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3654) ContainerLogsPage web UI should not have meta-refresh
[ https://issues.apache.org/jira/browse/YARN-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553424#comment-14553424 ] Hudson commented on YARN-3654: -- FAILURE: Integrated in Hadoop-trunk-Commit #7877 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7877/]) YARN-3654. ContainerLogsPage web UI should not have meta-refresh. Contributed by Xuan Gong (jianhe: rev 6329bd00fa1f17cc9555efa496ea7607ad93e0ce) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NMController.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NMWebAppFilter.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/ContainerLogsPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/WebServer.java > ContainerLogsPage web UI should not have meta-refresh > - > > Key: YARN-3654 > URL: https://issues.apache.org/jira/browse/YARN-3654 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.1 >Reporter: Xuan Gong >Assignee: Xuan Gong > Fix For: 2.8.0 > > Attachments: YARN-3654.1.patch, YARN-3654.2.patch > > > Currently, When we try to find the container logs for the finished > application, it will re-direct to the url which we re-configured for > yarn.log.server.url in yarn-site.xml. But in ContainerLogsPage, we are using > meta-refresh: > {code} > set(TITLE, join("Redirecting to log server for ", $(CONTAINER_ID))); > html.meta_http("refresh", "1; url=" + redirectUrl); > {code} > which is not good for some browsers which need to enable the meta-refresh in > their security setting, especially for IE which meta-refresh is considered a > security hole. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3505) Node's Log Aggregation Report with SUCCEED should not cached in RMApps
[ https://issues.apache.org/jira/browse/YARN-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-3505: Target Version/s: 2.8.0 (was: 2.8.0, 2.7.1) > Node's Log Aggregation Report with SUCCEED should not cached in RMApps > -- > > Key: YARN-3505 > URL: https://issues.apache.org/jira/browse/YARN-3505 > Project: Hadoop YARN > Issue Type: Sub-task > Components: log-aggregation >Affects Versions: 2.8.0 >Reporter: Junping Du >Assignee: Xuan Gong >Priority: Critical > Fix For: 2.8.0 > > Attachments: YARN-3505.1.patch, YARN-3505.2.patch, > YARN-3505.2.rebase.patch, YARN-3505.3.patch, YARN-3505.4.patch, > YARN-3505.5.patch, YARN-3505.6.patch, YARN-3505.addendum.patch > > > Per discussions in YARN-1402, we shouldn't cache all node's log aggregation > reports in RMApps for always, especially for those finished with SUCCEED. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553384#comment-14553384 ] Karthik Kambatla commented on YARN-2942: Thanks everyone for the discussion. Clearly, there are trade-offs to make between (1) a single aggregation across nodes for an application with a slightly higher chance of losing a container's logs if a node were to go down vs (2) a two-step aggregation that places more load on HDFS. While looking at this trade-off, we should consider HDFS state today and possible improvements in the future. If HDFS were to support concurrent-append, option 1 seems like a better approach. > Aggregated Log Files should be combined > --- > > Key: YARN-2942 > URL: https://issues.apache.org/jira/browse/YARN-2942 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: CombinedAggregatedLogsProposal_v3.pdf, > CombinedAggregatedLogsProposal_v6.pdf, CombinedAggregatedLogsProposal_v7.pdf, > CompactedAggregatedLogsProposal_v1.pdf, > CompactedAggregatedLogsProposal_v2.pdf, > ConcatableAggregatedLogsProposal_v4.pdf, > ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, > YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, > YARN-2942.003.patch > > > Turning on log aggregation allows users to easily store container logs in > HDFS and subsequently view them in the YARN web UIs from a central place. > Currently, there is a separate log file for each Node Manager. This can be a > problem for HDFS if you have a cluster with many nodes as you’ll slowly start > accumulating many (possibly small) files per YARN application. The current > “solution” for this problem is to configure YARN (actually the JHS) to > automatically delete these files after some amount of time. > We should improve this by compacting the per-node aggregated log files into > one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3609) Move load labels from storage from serviceInit to serviceStart to make it works with RM HA case.
[ https://issues.apache.org/jira/browse/YARN-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553377#comment-14553377 ] Hadoop QA commented on YARN-3609: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734279/YARN-3609.3.branch-2.7.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 8966d42 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8035/console | This message was automatically generated. > Move load labels from storage from serviceInit to serviceStart to make it > works with RM HA case. > > > Key: YARN-3609 > URL: https://issues.apache.org/jira/browse/YARN-3609 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-3609.1.preliminary.patch, YARN-3609.2.patch, > YARN-3609.3.branch-2.7.patch, YARN-3609.3.patch > > > Now RMNodeLabelsManager loads label when serviceInit, but > RMActiveService.start() is called when RM HA transition happens. > We haven't done this before because queue's initialization happens in > serviceInit as well, we need make sure labels added to system before init > queue, after YARN-2918, we should be able to do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3609) Move load labels from storage from serviceInit to serviceStart to make it works with RM HA case.
[ https://issues.apache.org/jira/browse/YARN-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3609: - Attachment: YARN-3609.3.branch-2.7.patch Attached branch-2.7 patch. > Move load labels from storage from serviceInit to serviceStart to make it > works with RM HA case. > > > Key: YARN-3609 > URL: https://issues.apache.org/jira/browse/YARN-3609 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-3609.1.preliminary.patch, YARN-3609.2.patch, > YARN-3609.3.branch-2.7.patch, YARN-3609.3.patch > > > Now RMNodeLabelsManager loads label when serviceInit, but > RMActiveService.start() is called when RM HA transition happens. > We haven't done this before because queue's initialization happens in > serviceInit as well, we need make sure labels added to system before init > queue, after YARN-2918, we should be able to do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553369#comment-14553369 ] Li Lu commented on YARN-3411: - I looked at the latest patch and think it's in a good shape for performance benchmarks. I've also ran it with a local single node hbase cluster, and it worked fine with our performance benchmark application as well as the PI sample application. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553370#comment-14553370 ] Sangjin Lee commented on YARN-3411: --- The latest patch LGTM. I'm fine with having a follow-up JIRA to address Junping's comment (and other minor issues if any). Once everyone chimes in and gives it +1, I'd be happy to commit this patch. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers
[ https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553364#comment-14553364 ] Hadoop QA commented on YARN-3051: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 55s | Pre-patch YARN-2928 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 7m 42s | There were no new javac warning messages. | | {color:red}-1{color} | javadoc | 9m 39s | The applied patch generated 6 additional warning messages. | | {color:red}-1{color} | release audit | 0m 19s | The applied patch generated 2 release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 19s | The applied patch generated 23 new checkstyle issues (total was 234, now 257). | | {color:green}+1{color} | shellcheck | 0m 6s | There were no new shellcheck (v0.3.3) issues. | | {color:red}-1{color} | whitespace | 0m 1s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 40s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 41s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 3m 36s | The patch appears to introduce 6 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 25s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 56s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 1m 3s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 43m 47s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-timelineservice | | | Found reliance on default encoding in org.apache.hadoop.yarn.server.timelineservice.storage.FileSystemTimelineReaderImpl.getEntities(String, String, String, Long, Long, Long, String, Long, Collection, Collection, Collection, Collection, Collection, EnumSet):in org.apache.hadoop.yarn.server.timelineservice.storage.FileSystemTimelineReaderImpl.getEntities(String, String, String, Long, Long, Long, String, Long, Collection, Collection, Collection, Collection, Collection, EnumSet): new java.io.FileReader(File) At FileSystemTimelineReaderImpl.java:[line 88] | | | Found reliance on default encoding in org.apache.hadoop.yarn.server.timelineservice.storage.FileSystemTimelineReaderImpl.getEntity(String, String, String, String, Collection, Collection, Long, Long, EnumSet):in org.apache.hadoop.yarn.server.timelineservice.storage.FileSystemTimelineReaderImpl.getEntity(String, String, String, String, Collection, Collection, Long, Long, EnumSet): new java.io.FileReader(File) At FileSystemTimelineReaderImpl.java:[line 68] | | FindBugs | module:hadoop-yarn-common | | | Inconsistent synchronization of org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.builder; locked 92% of time Unsynchronized access at AllocateResponsePBImpl.java:92% of time Unsynchronized access at AllocateResponsePBImpl.java:[line 391] | | | Inconsistent synchronization of org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.proto; locked 94% of time Unsynchronized access at AllocateResponsePBImpl.java:94% of time Unsynchronized access at AllocateResponsePBImpl.java:[line 391] | | | Inconsistent synchronization of org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.viaProto; locked 94% of time Unsynchronized access at AllocateResponsePBImpl.java:94% of time Unsynchronized access at AllocateResponsePBImpl.java:[line 391] | | FindBugs | module:hadoop-yarn-api | | | org.apache.hadoop.yarn.api.records.timelineservice.TimelineMetric$1.compare(Long, Long) negates the return value of Long.compareTo(Long) At TimelineMetric.java:value of Long.compareTo(Long) At TimelineMetric.java:[line 47] | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734255/YARN-3051-YARN-2928.03.patch | | Optional Tests | shellcheck javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 463e070 | | javadoc | https://builds.apache.org/job/PreCommit-YARN-Build/8034/artifact/patchprocess/diffJavadocWarnings.txt | | Release Audit | https://builds.apache.org/job/PreCommit-YARN-Build/8034/artifact/patchprocess/patchReleaseAuditProblems.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8034/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8034/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit
[jira] [Commented] (YARN-3609) Move load labels from storage from serviceInit to serviceStart to make it works with RM HA case.
[ https://issues.apache.org/jira/browse/YARN-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553359#comment-14553359 ] Hudson commented on YARN-3609: -- FAILURE: Integrated in Hadoop-trunk-Commit #7876 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7876/]) YARN-3609. Load node labels from storage inside RM serviceStart. Contributed by Wangda Tan (jianhe: rev 8966d4217969eb71767ba83a3ff2b5bb38189b19) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestFileSystemNodeLabelsStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/RMHATestBase.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHAForNodeLabels.java > Move load labels from storage from serviceInit to serviceStart to make it > works with RM HA case. > > > Key: YARN-3609 > URL: https://issues.apache.org/jira/browse/YARN-3609 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-3609.1.preliminary.patch, YARN-3609.2.patch, > YARN-3609.3.patch > > > Now RMNodeLabelsManager loads label when serviceInit, but > RMActiveService.start() is called when RM HA transition happens. > We haven't done this before because queue's initialization happens in > serviceInit as well, we need make sure labels added to system before init > queue, after YARN-2918, we should be able to do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553352#comment-14553352 ] Vrushali C commented on YARN-3411: -- bq. Or we cannot diff value with null and real 0. Hi Junping, So, currently, you are right, we can't differentiate between nulls and real 0. In hRaven, we use 0 in case of nulls for long or int values. But for things like timestamps, we need stricter checks. After the performance test, I will file a jira to ensure we handle this more carefully and return null (Long object) in case it's actually null. Hope that is fine. thanks Vrushali > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553353#comment-14553353 ] Vrushali C commented on YARN-3411: -- bq. Or we cannot diff value with null and real 0. Hi Junping, So, currently, you are right, we can't differentiate between nulls and real 0. In hRaven, we use 0 in case of nulls for long or int values. But for things like timestamps, we need stricter checks. After the performance test, I will file a jira to ensure we handle this more carefully and return null (Long object) in case it's actually null. Hope that is fine. thanks Vrushali > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3647) RMWebServices api's should use updated api from CommonNodeLabelsManager to get NodeLabel object
[ https://issues.apache.org/jira/browse/YARN-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553323#comment-14553323 ] Wangda Tan commented on YARN-3647: -- Latest patch LGTM. > RMWebServices api's should use updated api from CommonNodeLabelsManager to > get NodeLabel object > --- > > Key: YARN-3647 > URL: https://issues.apache.org/jira/browse/YARN-3647 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-3647.patch, 0002-YARN-3647.patch > > > After YARN-3579, RMWebServices apis can use the updated version of apis in > CommonNodeLabelsManager which gives full NodeLabel object instead of creating > NodeLabel object from plain label name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553302#comment-14553302 ] Hadoop QA commented on YARN-3411: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 4s | Pre-patch YARN-2928 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 45s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 44s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 15s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 41s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 41s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 38s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 14s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 37m 30s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734244/YARN-3411-YARN-2928.007.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 463e070 | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8033/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8033/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8033/console | This message was automatically generated. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2556) Tool to measure the performance of the timeline server
[ https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553284#comment-14553284 ] Hadoop QA commented on YARN-2556: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 6m 53s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 6 new or modified test files. | | {color:green}+1{color} | javac | 9m 47s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 31s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 19s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 2m 2s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 39s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 51s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | mapreduce tests | 99m 47s | Tests failed in hadoop-mapreduce-client-jobclient. | | | | 120m 54s | | \\ \\ || Reason || Tests || | Timed out tests | org.apache.hadoop.mapred.TestMerge | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734234/YARN-2556.10.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / 03f897f | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8031/artifact/patchprocess/whitespace.txt | | hadoop-mapreduce-client-jobclient test log | https://builds.apache.org/job/PreCommit-YARN-Build/8031/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8031/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8031/console | This message was automatically generated. > Tool to measure the performance of the timeline server > -- > > Key: YARN-2556 > URL: https://issues.apache.org/jira/browse/YARN-2556 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Jonathan Eagles >Assignee: Chang Li > Labels: BB2015-05-TBR > Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, > YARN-2556.1.patch, YARN-2556.10.patch, YARN-2556.2.patch, YARN-2556.3.patch, > YARN-2556.4.patch, YARN-2556.5.patch, YARN-2556.6.patch, YARN-2556.7.patch, > YARN-2556.8.patch, YARN-2556.9.patch, YARN-2556.patch, yarn2556.patch, > yarn2556.patch, yarn2556_wip.patch > > > We need to be able to understand the capacity model for the timeline server > to give users the tools they need to deploy a timeline server with the > correct capacity. > I propose we create a mapreduce job that can measure timeline server write > and read performance. Transactions per second, I/O for both read and write > would be a good start. > This could be done as an example or test job that could be tied into gridmix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers
[ https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3051: --- Attachment: YARN-3051-YARN-2928.03.patch > [Storage abstraction] Create backing storage read interface for ATS readers > --- > > Key: YARN-3051 > URL: https://issues.apache.org/jira/browse/YARN-3051 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: YARN-3051-YARN-2928.03.patch, > YARN-3051.wip.02.YARN-2928.patch, YARN-3051.wip.patch, YARN-3051_temp.patch > > > Per design in YARN-2928, create backing storage read interface that can be > implemented by multiple backing storage implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3681) yarn cmd says "could not find main class 'queue'" in windows
[ https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553189#comment-14553189 ] Xuan Gong commented on YARN-3681: - Committed into trunk/branch-2/branch-2.7. Thanks, craig and varun > yarn cmd says "could not find main class 'queue'" in windows > > > Key: YARN-3681 > URL: https://issues.apache.org/jira/browse/YARN-3681 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.0 > Environment: Windows Only >Reporter: Sumana Sathish >Assignee: Varun Saxena >Priority: Blocker > Labels: windows, yarn-client > Fix For: 2.7.1 > > Attachments: YARN-3681.0.patch, YARN-3681.01.patch, > YARN-3681.1.patch, YARN-3681.branch-2.0.patch, yarncmd.png > > > Attached the screenshot of the command prompt in windows running yarn queue > command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553186#comment-14553186 ] Li Lu commented on YARN-3411: - Hi [~vrushalic], sure, don't worry about the test code clean up for now. I'll try it locally. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3411: - Attachment: YARN-3411-YARN-2928.007.patch Uploading YARN-3411-YARN-2928.007.patch. I think I have addressed everyone's comments. I have been going up and down scrolling on this jira page since yesterday and I hope I have not missed out on any comment. [~gtCarrera9] I have not yet moved the test data into TestTimelineWriterImpl since it has almost a similar information setup for timeline entity but with more cases. I can modify it later. I have tested the HBase writer with Sangjin's driver code as well. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3681) yarn cmd says "could not find main class 'queue'" in windows
[ https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3681: -- Attachment: YARN-3681.branch-2.0.patch Here is one for branch-2 > yarn cmd says "could not find main class 'queue'" in windows > > > Key: YARN-3681 > URL: https://issues.apache.org/jira/browse/YARN-3681 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.0 > Environment: Windows Only >Reporter: Sumana Sathish >Assignee: Varun Saxena >Priority: Blocker > Labels: windows, yarn-client > Attachments: YARN-3681.0.patch, YARN-3681.01.patch, > YARN-3681.1.patch, YARN-3681.branch-2.0.patch, yarncmd.png > > > Attached the screenshot of the command prompt in windows running yarn queue > command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3388) Allocation in LeafQueue could get stuck because DRF calculator isn't well supported when computing user-limit
[ https://issues.apache.org/jira/browse/YARN-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553165#comment-14553165 ] Nathan Roberts commented on YARN-3388: -- Thanks [~leftnoteasy] for the comments. I agree 2b is the way to go. I will upload a new patch soon. > Allocation in LeafQueue could get stuck because DRF calculator isn't well > supported when computing user-limit > - > > Key: YARN-3388 > URL: https://issues.apache.org/jira/browse/YARN-3388 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.6.0 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-3388-v0.patch, YARN-3388-v1.patch, > YARN-3388-v2.patch > > > When there are multiple active users in a queue, it should be possible for > those users to make use of capacity up-to max_capacity (or close). The > resources should be fairly distributed among the active users in the queue. > This works pretty well when there is a single resource being scheduled. > However, when there are multiple resources the situation gets more complex > and the current algorithm tends to get stuck at Capacity. > Example illustrated in subsequent comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3681) yarn cmd says "could not find main class 'queue'" in windows
[ https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553159#comment-14553159 ] Hudson commented on YARN-3681: -- FAILURE: Integrated in Hadoop-trunk-Commit #7875 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7875/]) YARN-3681. yarn cmd says "could not find main class 'queue'" in windows. (xgong: rev 5774f6b1e577ee64bde8c7c1e39f404b9e651176) * hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd * hadoop-yarn-project/CHANGES.txt > yarn cmd says "could not find main class 'queue'" in windows > > > Key: YARN-3681 > URL: https://issues.apache.org/jira/browse/YARN-3681 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.0 > Environment: Windows Only >Reporter: Sumana Sathish >Assignee: Varun Saxena >Priority: Blocker > Labels: windows, yarn-client > Attachments: YARN-3681.0.patch, YARN-3681.01.patch, > YARN-3681.1.patch, yarncmd.png > > > Attached the screenshot of the command prompt in windows running yarn queue > command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2918) Don't fail RM if queue's configured labels are not existed in cluster-node-labels
[ https://issues.apache.org/jira/browse/YARN-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553158#comment-14553158 ] Hudson commented on YARN-2918: -- FAILURE: Integrated in Hadoop-trunk-Commit #7875 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7875/]) Move YARN-2918 from 2.8.0 to 2.7.1 (wangda: rev 03f897fd1a3779251023bae358207069b89addbf) * hadoop-yarn-project/CHANGES.txt > Don't fail RM if queue's configured labels are not existed in > cluster-node-labels > - > > Key: YARN-2918 > URL: https://issues.apache.org/jira/browse/YARN-2918 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Rohith >Assignee: Wangda Tan > Fix For: 2.8.0, 2.7.1 > > Attachments: YARN-2918.1.patch, YARN-2918.2.patch, YARN-2918.3.patch > > > Currently, if admin setup labels on queues > {{.accessible-node-labels = ...}}. And the label is not added to > RM, queue's initialization will fail and RM will fail too: > {noformat} > 2014-12-03 20:11:50,126 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting > ResourceManager > ... > Caused by: java.io.IOException: NodeLabelManager doesn't include label = x, > please check. > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkIfLabelInClusterNodeLabels(SchedulerUtils.java:287) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.(AbstractCSQueue.java:109) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.(LeafQueue.java:120) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:567) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:587) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:462) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:294) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:324) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > {noformat} > This is not a good user experience, we should stop fail RM so that admin can > configure queue/labels in following steps: > - Configure queue (with label) > - Start RM > - Add labels to RM > - Submit applications > Now admin has to: > - Configure queue (without label) > - Start RM > - Add labels to RM > - Refresh queue's config (with label) > - Submit applications -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3675) FairScheduler: RM quits when node removal races with continousscheduling on the same node
[ https://issues.apache.org/jira/browse/YARN-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553151#comment-14553151 ] Hadoop QA commented on YARN-3675: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 34s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 31s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 35s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 46s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 16s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 50m 4s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 86m 17s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.TestRMRestart | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734207/YARN-3675.003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 4aa730c | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8030/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8030/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8030/console | This message was automatically generated. > FairScheduler: RM quits when node removal races with continousscheduling on > the same node > - > > Key: YARN-3675 > URL: https://issues.apache.org/jira/browse/YARN-3675 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3675.001.patch, YARN-3675.002.patch, > YARN-3675.003.patch > > > With continuous scheduling, scheduling can be done on a node thats just > removed causing errors like below. > {noformat} > 12:28:53.782 AM FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager > Error in handling event type APP_ATTEMPT_REMOVED to the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.unreserve(FSAppAttempt.java:469) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:815) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:763) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1217) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684) > at java.lang.Thread.run(Thread.java:745) > 12:28:53.783 AMINFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Exiting, bbye.. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3681) yarn cmd says "could not find main class 'queue'" in windows
[ https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553114#comment-14553114 ] Xuan Gong commented on YARN-3681: - Use git apply -p0 --whitespace=fix could apply the patch. The patch looks good to me. +1 will commit > yarn cmd says "could not find main class 'queue'" in windows > > > Key: YARN-3681 > URL: https://issues.apache.org/jira/browse/YARN-3681 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.0 > Environment: Windows Only >Reporter: Sumana Sathish >Assignee: Varun Saxena >Priority: Blocker > Labels: windows, yarn-client > Attachments: YARN-3681.0.patch, YARN-3681.01.patch, > YARN-3681.1.patch, yarncmd.png > > > Attached the screenshot of the command prompt in windows running yarn queue > command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2918) Don't fail RM if queue's configured labels are not existed in cluster-node-labels
[ https://issues.apache.org/jira/browse/YARN-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553086#comment-14553086 ] Wangda Tan commented on YARN-2918: -- Back-ported this patch to 2.7.1, updating fix version. > Don't fail RM if queue's configured labels are not existed in > cluster-node-labels > - > > Key: YARN-2918 > URL: https://issues.apache.org/jira/browse/YARN-2918 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Rohith >Assignee: Wangda Tan > Fix For: 2.8.0, 2.7.1 > > Attachments: YARN-2918.1.patch, YARN-2918.2.patch, YARN-2918.3.patch > > > Currently, if admin setup labels on queues > {{.accessible-node-labels = ...}}. And the label is not added to > RM, queue's initialization will fail and RM will fail too: > {noformat} > 2014-12-03 20:11:50,126 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting > ResourceManager > ... > Caused by: java.io.IOException: NodeLabelManager doesn't include label = x, > please check. > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkIfLabelInClusterNodeLabels(SchedulerUtils.java:287) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.(AbstractCSQueue.java:109) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.(LeafQueue.java:120) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:567) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:587) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:462) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:294) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:324) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > {noformat} > This is not a good user experience, we should stop fail RM so that admin can > configure queue/labels in following steps: > - Configure queue (with label) > - Start RM > - Add labels to RM > - Submit applications > Now admin has to: > - Configure queue (without label) > - Start RM > - Add labels to RM > - Refresh queue's config (with label) > - Submit applications -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2918) Don't fail RM if queue's configured labels are not existed in cluster-node-labels
[ https://issues.apache.org/jira/browse/YARN-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2918: - Fix Version/s: 2.7.1 > Don't fail RM if queue's configured labels are not existed in > cluster-node-labels > - > > Key: YARN-2918 > URL: https://issues.apache.org/jira/browse/YARN-2918 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Rohith >Assignee: Wangda Tan > Fix For: 2.8.0, 2.7.1 > > Attachments: YARN-2918.1.patch, YARN-2918.2.patch, YARN-2918.3.patch > > > Currently, if admin setup labels on queues > {{.accessible-node-labels = ...}}. And the label is not added to > RM, queue's initialization will fail and RM will fail too: > {noformat} > 2014-12-03 20:11:50,126 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting > ResourceManager > ... > Caused by: java.io.IOException: NodeLabelManager doesn't include label = x, > please check. > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkIfLabelInClusterNodeLabels(SchedulerUtils.java:287) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.(AbstractCSQueue.java:109) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.(LeafQueue.java:120) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:567) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:587) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:462) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:294) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:324) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > {noformat} > This is not a good user experience, we should stop fail RM so that admin can > configure queue/labels in following steps: > - Configure queue (with label) > - Start RM > - Add labels to RM > - Submit applications > Now admin has to: > - Configure queue (without label) > - Start RM > - Add labels to RM > - Refresh queue's config (with label) > - Submit applications -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2556) Tool to measure the performance of the timeline server
[ https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-2556: --- Attachment: YARN-2556.10.patch Add JobHistoryFileReplayMapper mapper > Tool to measure the performance of the timeline server > -- > > Key: YARN-2556 > URL: https://issues.apache.org/jira/browse/YARN-2556 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Jonathan Eagles >Assignee: Chang Li > Labels: BB2015-05-TBR > Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, > YARN-2556.1.patch, YARN-2556.10.patch, YARN-2556.2.patch, YARN-2556.3.patch, > YARN-2556.4.patch, YARN-2556.5.patch, YARN-2556.6.patch, YARN-2556.7.patch, > YARN-2556.8.patch, YARN-2556.9.patch, YARN-2556.patch, yarn2556.patch, > yarn2556.patch, yarn2556_wip.patch > > > We need to be able to understand the capacity model for the timeline server > to give users the tools they need to deploy a timeline server with the > correct capacity. > I propose we create a mapreduce job that can measure timeline server write > and read performance. Transactions per second, I/O for both read and write > would be a good start. > This could be done as an example or test job that could be tied into gridmix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-3691) FairScheduler: Limit number of reservations for a container
[ https://issues.apache.org/jira/browse/YARN-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553001#comment-14553001 ] Karthik Kambatla edited comment on YARN-3691 at 5/20/15 8:09 PM: - The number of reservations should be per container and not per application? If an app is looking to get resources for 10 containers, it should be able to make reservations independently for each container. was (Author: kasha): The number of reservations should be per component and not per application? If an app is looking to get resources for 10 containers, it should be able to make reservations independently for each container. > FairScheduler: Limit number of reservations for a container > --- > > Key: YARN-3691 > URL: https://issues.apache.org/jira/browse/YARN-3691 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Arun Suresh >Assignee: Arun Suresh > > Currently, It is possible to reserve resource for an app on all nodes. > Limiting this to possibly just a number of nodes (or a ratio of the total > cluster size) would improve utilization of the cluster and will reduce the > possibility of starving other apps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-314) Schedulers should allow resource requests of different sizes at the same priority and location
[ https://issues.apache.org/jira/browse/YARN-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553015#comment-14553015 ] Karthik Kambatla commented on YARN-314: --- I am essentially proposing an efficient way to index the pending requests across multiple axes. Each of these indices are captured by a map. The only reason to colocate them is to not disperse this indexing (mapping) logic across multiple classes. We should able to quickly look up all requests for an app for reporting etc., and also look up all node-local requests across applications at schedule time without having to iterate through all the applications. The maps could be - >>, >>. Current {{AppSchedulingInfo}} could stay as is and use the former map to get the corresponding requests. > Schedulers should allow resource requests of different sizes at the same > priority and location > -- > > Key: YARN-314 > URL: https://issues.apache.org/jira/browse/YARN-314 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Affects Versions: 2.0.2-alpha >Reporter: Sandy Ryza > Attachments: yarn-314-prelim.patch > > > Currently, resource requests for the same container and locality are expected > to all be the same size. > While it it doesn't look like it's needed for apps currently, and can be > circumvented by specifying different priorities if absolutely necessary, it > seems to me that the ability to request containers with different resource > requirements at the same priority level should be there for the future and > for completeness sake. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3686) CapacityScheduler should trim default_node_label_expression
[ https://issues.apache.org/jira/browse/YARN-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553000#comment-14553000 ] Hadoop QA commented on YARN-3686: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 29s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 31s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 38s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 24s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 38s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 16s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 50m 20s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 86m 14s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734160/0002-YARN-3686.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 4aa730c | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8029/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8029/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8029/console | This message was automatically generated. > CapacityScheduler should trim default_node_label_expression > --- > > Key: YARN-3686 > URL: https://issues.apache.org/jira/browse/YARN-3686 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Sunil G >Priority: Critical > Attachments: 0001-YARN-3686.patch, 0002-YARN-3686.patch > > > We should trim default_node_label_expression for queue before using it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3691) FairScheduler: Limit number of reservations for a container
[ https://issues.apache.org/jira/browse/YARN-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553001#comment-14553001 ] Karthik Kambatla commented on YARN-3691: The number of reservations should be per component and not per application? If an app is looking to get resources for 10 containers, it should be able to make reservations independently for each container. > FairScheduler: Limit number of reservations for a container > --- > > Key: YARN-3691 > URL: https://issues.apache.org/jira/browse/YARN-3691 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Arun Suresh >Assignee: Arun Suresh > > Currently, It is possible to reserve resource for an app on all nodes. > Limiting this to possibly just a number of nodes (or a ratio of the total > cluster size) would improve utilization of the cluster and will reduce the > possibility of starving other apps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3691) FairScheduler: Limit number of reservations for a container
[ https://issues.apache.org/jira/browse/YARN-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3691: --- Summary: FairScheduler: Limit number of reservations for a container (was: Limit number of reservations for an app) > FairScheduler: Limit number of reservations for a container > --- > > Key: YARN-3691 > URL: https://issues.apache.org/jira/browse/YARN-3691 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Arun Suresh >Assignee: Arun Suresh > > Currently, It is possible to reserve resource for an app on all nodes. > Limiting this to possibly just a number of nodes (or a ratio of the total > cluster size) would improve utilization of the cluster and will reduce the > possibility of starving other apps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3467) Expose allocatedMB, allocatedVCores, and runningContainers metrics on running Applications in RM Web UI
[ https://issues.apache.org/jira/browse/YARN-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552959#comment-14552959 ] Karthik Kambatla commented on YARN-3467: We should add this information to ApplicationAttempt page, and also preferably to the RM Web UI. I have heard asks for both number of containers and allocated resources on the RM applications page, so people can sort applications by that. > Expose allocatedMB, allocatedVCores, and runningContainers metrics on running > Applications in RM Web UI > --- > > Key: YARN-3467 > URL: https://issues.apache.org/jira/browse/YARN-3467 > Project: Hadoop YARN > Issue Type: New Feature > Components: webapp, yarn >Affects Versions: 2.5.0 >Reporter: Anthony Rojas >Assignee: Anubhav Dhoot >Priority: Minor > Attachments: ApplicationAttemptPage.png > > > The YARN REST API can report on the following properties: > *allocatedMB*: The sum of memory in MB allocated to the application's running > containers > *allocatedVCores*: The sum of virtual cores allocated to the application's > running containers > *runningContainers*: The number of containers currently running for the > application > Currently, the RM Web UI does not report on these items (at least I couldn't > find any entries within the Web UI). > It would be useful for YARN Application and Resource troubleshooting to have > these properties and their corresponding values exposed on the RM WebUI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2355) MAX_APP_ATTEMPTS_ENV may no longer be a useful env var for a container
[ https://issues.apache.org/jira/browse/YARN-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552958#comment-14552958 ] Hadoop QA commented on YARN-2355: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 38s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 39s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 45s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 39s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 26s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 50m 1s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 89m 14s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734179/YARN-2355.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 4aa730c | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8028/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8028/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8028/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8028/console | This message was automatically generated. > MAX_APP_ATTEMPTS_ENV may no longer be a useful env var for a container > -- > > Key: YARN-2355 > URL: https://issues.apache.org/jira/browse/YARN-2355 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Darrell Taylor > Labels: newbie > Attachments: YARN-2355.001.patch > > > After YARN-2074, YARN-614 and YARN-611, the application cannot judge whether > it has the chance to try based on MAX_APP_ATTEMPTS_ENV alone. We should be > able to notify the application of the up-to-date remaining retry quota. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3675) FairScheduler: RM quits when node removal races with continousscheduling on the same node
[ https://issues.apache.org/jira/browse/YARN-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3675: Attachment: YARN-3675.003.patch Removed spurious changes and changed visibility of attemptScheduling > FairScheduler: RM quits when node removal races with continousscheduling on > the same node > - > > Key: YARN-3675 > URL: https://issues.apache.org/jira/browse/YARN-3675 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3675.001.patch, YARN-3675.002.patch, > YARN-3675.003.patch > > > With continuous scheduling, scheduling can be done on a node thats just > removed causing errors like below. > {noformat} > 12:28:53.782 AM FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager > Error in handling event type APP_ATTEMPT_REMOVED to the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.unreserve(FSAppAttempt.java:469) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:815) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:763) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1217) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684) > at java.lang.Thread.run(Thread.java:745) > 12:28:53.783 AMINFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Exiting, bbye.. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2408) Resource Request REST API for YARN
[ https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552849#comment-14552849 ] Renan DelValle commented on YARN-2408: -- [~leftnoteasy], thanks for taking a look at the patch, really appreciate it. 1) I agree, the original patch I had was very verbose so I shrunk down the amount of data being transferred by clustering resource requests together. Seems to be the best alternative to keeping original ResourceRequest structures. 2) I will take a look at that and implement it that way. (Thank you for pointing me in the right direction). On the resource-by-label inclusion, do you think it would be better to wait until it is patched into the trunk in order to make the process easier? > Resource Request REST API for YARN > -- > > Key: YARN-2408 > URL: https://issues.apache.org/jira/browse/YARN-2408 > Project: Hadoop YARN > Issue Type: New Feature > Components: webapp >Reporter: Renan DelValle > Labels: features > Attachments: YARN-2408-6.patch > > > I’m proposing a new REST API for YARN which exposes a snapshot of the > Resource Requests that exist inside of the Scheduler. My motivation behind > this new feature is to allow external software to monitor the amount of > resources being requested to gain more insightful information into cluster > usage than is already provided. The API can also be used by external software > to detect a starved application and alert the appropriate users and/or sys > admin so that the problem may be remedied. > Here is the proposed API (a JSON counterpart is also available): > {code:xml} > > 7680 > 7 > > application_1412191664217_0001 > > appattempt_1412191664217_0001_01 > default > 6144 > 6 > 3 > > > 1024 > 1 > 6 > true > 20 > > localMachine > /default-rack > * > > > > > > ... > > > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3675) FairScheduler: RM quits when node removal races with continousscheduling on the same node
[ https://issues.apache.org/jira/browse/YARN-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552841#comment-14552841 ] Hadoop QA commented on YARN-3675: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 44s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 41s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 38s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 47s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 16s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 50m 51s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 87m 29s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734156/YARN-3675.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 4aa730c | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8025/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8025/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8025/console | This message was automatically generated. > FairScheduler: RM quits when node removal races with continousscheduling on > the same node > - > > Key: YARN-3675 > URL: https://issues.apache.org/jira/browse/YARN-3675 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3675.001.patch, YARN-3675.002.patch > > > With continuous scheduling, scheduling can be done on a node thats just > removed causing errors like below. > {noformat} > 12:28:53.782 AM FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager > Error in handling event type APP_ATTEMPT_REMOVED to the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.unreserve(FSAppAttempt.java:469) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:815) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:763) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1217) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684) > at java.lang.Thread.run(Thread.java:745) > 12:28:53.783 AMINFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Exiting, bbye.. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3692) Allow REST API to set a user generated message when killing an application
Rajat Jain created YARN-3692: Summary: Allow REST API to set a user generated message when killing an application Key: YARN-3692 URL: https://issues.apache.org/jira/browse/YARN-3692 Project: Hadoop YARN Issue Type: Improvement Reporter: Rajat Jain Currently YARN's REST API supports killing an application without setting a diagnostic message. It would be good to provide that support. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-314) Schedulers should allow resource requests of different sizes at the same priority and location
[ https://issues.apache.org/jira/browse/YARN-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552800#comment-14552800 ] Wangda Tan commented on YARN-314: - [~kasha], Actually I'm not quite sure about this proposal, what's the benefit of putting all apps' requests together comparing to hold one data structure per app, is there any use case? > Schedulers should allow resource requests of different sizes at the same > priority and location > -- > > Key: YARN-314 > URL: https://issues.apache.org/jira/browse/YARN-314 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Affects Versions: 2.0.2-alpha >Reporter: Sandy Ryza > Attachments: yarn-314-prelim.patch > > > Currently, resource requests for the same container and locality are expected > to all be the same size. > While it it doesn't look like it's needed for apps currently, and can be > circumvented by specifying different priorities if absolutely necessary, it > seems to me that the ability to request containers with different resource > requirements at the same priority level should be there for the future and > for completeness sake. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2355) MAX_APP_ATTEMPTS_ENV may no longer be a useful env var for a container
[ https://issues.apache.org/jira/browse/YARN-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Darrell Taylor updated YARN-2355: - Attachment: YARN-2355.001.patch > MAX_APP_ATTEMPTS_ENV may no longer be a useful env var for a container > -- > > Key: YARN-2355 > URL: https://issues.apache.org/jira/browse/YARN-2355 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Darrell Taylor > Labels: newbie > Attachments: YARN-2355.001.patch > > > After YARN-2074, YARN-614 and YARN-611, the application cannot judge whether > it has the chance to try based on MAX_APP_ATTEMPTS_ENV alone. We should be > able to notify the application of the up-to-date remaining retry quota. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3681) yarn cmd says "could not find main class 'queue'" in windows
[ https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552738#comment-14552738 ] Varun Saxena commented on YARN-3681: [~cwelch], it has to do with line endings. I have to run {{unix2dos}} to convert line endings for Jenkins to accept it. Windows batch files patches do not always apply depending on settings of line endings done by the user. I think my patch did not apply for you because of that reason. > yarn cmd says "could not find main class 'queue'" in windows > > > Key: YARN-3681 > URL: https://issues.apache.org/jira/browse/YARN-3681 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.0 > Environment: Windows Only >Reporter: Sumana Sathish >Assignee: Varun Saxena >Priority: Blocker > Labels: windows, yarn-client > Attachments: YARN-3681.0.patch, YARN-3681.01.patch, > YARN-3681.1.patch, yarncmd.png > > > Attached the screenshot of the command prompt in windows running yarn queue > command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3681) yarn cmd says "could not find main class 'queue'" in windows
[ https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552736#comment-14552736 ] Hadoop QA commented on YARN-3681: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734165/YARN-3681.1.patch | | Optional Tests | | | git revision | trunk / 4aa730c | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8027/console | This message was automatically generated. > yarn cmd says "could not find main class 'queue'" in windows > > > Key: YARN-3681 > URL: https://issues.apache.org/jira/browse/YARN-3681 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.0 > Environment: Windows Only >Reporter: Sumana Sathish >Assignee: Varun Saxena >Priority: Blocker > Labels: windows, yarn-client > Attachments: YARN-3681.0.patch, YARN-3681.01.patch, > YARN-3681.1.patch, yarncmd.png > > > Attached the screenshot of the command prompt in windows running yarn queue > command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3681) yarn cmd says "could not find main class 'queue'" in windows
[ https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3681: -- Attachment: YARN-3681.1.patch Oh the irony, neither did my own. Updated to one which does. > yarn cmd says "could not find main class 'queue'" in windows > > > Key: YARN-3681 > URL: https://issues.apache.org/jira/browse/YARN-3681 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.0 > Environment: Windows Only >Reporter: Sumana Sathish >Assignee: Varun Saxena >Priority: Blocker > Labels: windows, yarn-client > Attachments: YARN-3681.0.patch, YARN-3681.01.patch, > YARN-3681.1.patch, yarncmd.png > > > Attached the screenshot of the command prompt in windows running yarn queue > command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3691) Limit number of reservations for an app
Arun Suresh created YARN-3691: - Summary: Limit number of reservations for an app Key: YARN-3691 URL: https://issues.apache.org/jira/browse/YARN-3691 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Arun Suresh Currently, It is possible to reserve resource for an app on all nodes. Limiting this to possibly just a number of nodes (or a ratio of the total cluster size) would improve utilization of the cluster and will reduce the possibility of starving other apps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3691) Limit number of reservations for an app
[ https://issues.apache.org/jira/browse/YARN-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh reassigned YARN-3691: - Assignee: Arun Suresh > Limit number of reservations for an app > --- > > Key: YARN-3691 > URL: https://issues.apache.org/jira/browse/YARN-3691 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Arun Suresh >Assignee: Arun Suresh > > Currently, It is possible to reserve resource for an app on all nodes. > Limiting this to possibly just a number of nodes (or a ratio of the total > cluster size) would improve utilization of the cluster and will reduce the > possibility of starving other apps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3681) yarn cmd says "could not find main class 'queue'" in windows
[ https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552710#comment-14552710 ] Hadoop QA commented on YARN-3681: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734163/YARN-3681.0.patch | | Optional Tests | | | git revision | trunk / 4aa730c | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8026/console | This message was automatically generated. > yarn cmd says "could not find main class 'queue'" in windows > > > Key: YARN-3681 > URL: https://issues.apache.org/jira/browse/YARN-3681 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.0 > Environment: Windows Only >Reporter: Sumana Sathish >Assignee: Varun Saxena >Priority: Blocker > Labels: windows, yarn-client > Attachments: YARN-3681.0.patch, YARN-3681.01.patch, yarncmd.png > > > Attached the screenshot of the command prompt in windows running yarn queue > command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3686) CapacityScheduler should trim default_node_label_expression
[ https://issues.apache.org/jira/browse/YARN-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552711#comment-14552711 ] Wangda Tan commented on YARN-3686: -- [~sunilg], thanks for working on this, comments: - I think you can try to add to {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeNodeLabelExpressionInRequest(ResourceRequest, QueueInfo)}}, which needs trim node-label-expression as well - Actually this is a regression, in 2.6 queue's node label expression with spaces can setup without any issue. It's better to add test to make sure 1. spaces in resource request will be trimmed 2. spaces in queue configuration (default-node-label-expression) will be trimmed. > CapacityScheduler should trim default_node_label_expression > --- > > Key: YARN-3686 > URL: https://issues.apache.org/jira/browse/YARN-3686 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Sunil G >Priority: Critical > Attachments: 0001-YARN-3686.patch, 0002-YARN-3686.patch > > > We should trim default_node_label_expression for queue before using it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3467) Expose allocatedMB, allocatedVCores, and runningContainers metrics on running Applications in RM Web UI
[ https://issues.apache.org/jira/browse/YARN-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552699#comment-14552699 ] Anubhav Dhoot commented on YARN-3467: - Attaching the ApplicationAttempt page. It does show the number of running containers. But it does not show actual allocated resources overall for the application attempt. > Expose allocatedMB, allocatedVCores, and runningContainers metrics on running > Applications in RM Web UI > --- > > Key: YARN-3467 > URL: https://issues.apache.org/jira/browse/YARN-3467 > Project: Hadoop YARN > Issue Type: New Feature > Components: webapp, yarn >Affects Versions: 2.5.0 >Reporter: Anthony Rojas >Assignee: Anubhav Dhoot >Priority: Minor > Attachments: ApplicationAttemptPage.png > > > The YARN REST API can report on the following properties: > *allocatedMB*: The sum of memory in MB allocated to the application's running > containers > *allocatedVCores*: The sum of virtual cores allocated to the application's > running containers > *runningContainers*: The number of containers currently running for the > application > Currently, the RM Web UI does not report on these items (at least I couldn't > find any entries within the Web UI). > It would be useful for YARN Application and Resource troubleshooting to have > these properties and their corresponding values exposed on the RM WebUI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3681) yarn cmd says "could not find main class 'queue'" in windows
[ https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552700#comment-14552700 ] Craig Welch commented on YARN-3681: --- [~varun_saxena] the patch you had doesn't apply properly for me, I've uploaded a patch which does the same things which does, and which I've had the opportunity to test. @xgong, can you take a look at this one (.0.patch)? Thanks. > yarn cmd says "could not find main class 'queue'" in windows > > > Key: YARN-3681 > URL: https://issues.apache.org/jira/browse/YARN-3681 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.0 > Environment: Windows Only >Reporter: Sumana Sathish >Assignee: Varun Saxena >Priority: Blocker > Labels: windows, yarn-client > Attachments: YARN-3681.0.patch, YARN-3681.01.patch, yarncmd.png > > > Attached the screenshot of the command prompt in windows running yarn queue > command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3681) yarn cmd says "could not find main class 'queue'" in windows
[ https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3681: -- Attachment: YARN-3681.0.patch > yarn cmd says "could not find main class 'queue'" in windows > > > Key: YARN-3681 > URL: https://issues.apache.org/jira/browse/YARN-3681 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.0 > Environment: Windows Only >Reporter: Sumana Sathish >Assignee: Varun Saxena >Priority: Blocker > Labels: windows, yarn-client > Attachments: YARN-3681.0.patch, YARN-3681.01.patch, yarncmd.png > > > Attached the screenshot of the command prompt in windows running yarn queue > command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3467) Expose allocatedMB, allocatedVCores, and runningContainers metrics on running Applications in RM Web UI
[ https://issues.apache.org/jira/browse/YARN-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3467: Attachment: ApplicationAttemptPage.png > Expose allocatedMB, allocatedVCores, and runningContainers metrics on running > Applications in RM Web UI > --- > > Key: YARN-3467 > URL: https://issues.apache.org/jira/browse/YARN-3467 > Project: Hadoop YARN > Issue Type: New Feature > Components: webapp, yarn >Affects Versions: 2.5.0 >Reporter: Anthony Rojas >Assignee: Anubhav Dhoot >Priority: Minor > Attachments: ApplicationAttemptPage.png > > > The YARN REST API can report on the following properties: > *allocatedMB*: The sum of memory in MB allocated to the application's running > containers > *allocatedVCores*: The sum of virtual cores allocated to the application's > running containers > *runningContainers*: The number of containers currently running for the > application > Currently, the RM Web UI does not report on these items (at least I couldn't > find any entries within the Web UI). > It would be useful for YARN Application and Resource troubleshooting to have > these properties and their corresponding values exposed on the RM WebUI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be
[ https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552686#comment-14552686 ] Craig Welch commented on YARN-3626: --- Checkstyle looks insignificant. [~cnauroth], [~vinodkv], I've changed the approach to use the environment instead of configuration as suggested, can one of you review pls? > On Windows localized resources are not moved to the front of the classpath > when they should be > -- > > Key: YARN-3626 > URL: https://issues.apache.org/jira/browse/YARN-3626 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn > Environment: Windows >Reporter: Craig Welch >Assignee: Craig Welch > Fix For: 2.7.1 > > Attachments: YARN-3626.0.patch, YARN-3626.11.patch, > YARN-3626.14.patch, YARN-3626.4.patch, YARN-3626.6.patch, YARN-3626.9.patch > > > In response to the mapreduce.job.user.classpath.first setting the classpath > is ordered differently so that localized resources will appear before system > classpath resources when tasks execute. On Windows this does not work > because the localized resources are not linked into their final location when > the classpath jar is created. To compensate for that localized jar resources > are added directly to the classpath generated for the jar rather than being > discovered from the localized directories. Unfortunately, they are always > appended to the classpath, and so are never preferred over system resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3685) NodeManager unnecessarily knows about classpath-jars due to Windows limitations
[ https://issues.apache.org/jira/browse/YARN-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552680#comment-14552680 ] Chris Nauroth commented on YARN-3685: - [~vinodkv], thanks for the notification. I was not aware of this design goal at the time of YARN-316. Perhaps it's possible to move the classpath jar generation to the MR client or AM. It's not immediately obvious to me which of those 2 choices is better. We'd need to change the manifest to use relative paths in the Class-Path attribute instead of absolute paths. (The client and AM are not aware of the exact layout of the NodeManager's {{yarn.nodemanager.local-dirs}}, so the client can't predict the absolute paths at time of container launch.) There is one piece of logic that I don't see how to handle though. Some classpath entries are defined in terms of environment variables. These environment variables are expanded at the NodeManager via the container launch scripts. This was true of Linux even before YARN-316, so in that sense, YARN did already have some classpath logic indirectly. Environment variables cannot be used inside a manifest's Class-Path, so for Windows, NodeManager expands the environment variables before populating Class-Path. It would be incorrect to do the environment variable expansion at the MR client, because it might be running with different configuration than the NodeManager. I suppose if the AM did the expansion, then that would work in most cases, but it creates an assumption that the AM container is running with configuration that matches all NodeManagers in the cluster. I don't believe that assumption exists today. If we do move classpath handling out of the NodeManager, then it would be a backwards-incompatible change, and so it could not be shipped in the 2.x release line. > NodeManager unnecessarily knows about classpath-jars due to Windows > limitations > --- > > Key: YARN-3685 > URL: https://issues.apache.org/jira/browse/YARN-3685 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > > Found this while looking at cleaning up ContainerExecutor via YARN-3648, > making it a sub-task. > YARN *should not* know about classpaths. Our original design modeled around > this. But when we added windows suppport, due to classpath issues, we ended > up breaking this abstraction via YARN-316. We should clean this up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3686) CapacityScheduler should trim default_node_label_expression
[ https://issues.apache.org/jira/browse/YARN-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3686: -- Attachment: 0002-YARN-3686.patch Uploading another patch covering a negative scenario. > CapacityScheduler should trim default_node_label_expression > --- > > Key: YARN-3686 > URL: https://issues.apache.org/jira/browse/YARN-3686 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Sunil G >Priority: Critical > Attachments: 0001-YARN-3686.patch, 0002-YARN-3686.patch > > > We should trim default_node_label_expression for queue before using it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3675) FairScheduler: RM quits when node removal races with continousscheduling on the same node
[ https://issues.apache.org/jira/browse/YARN-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3675: Attachment: YARN-3675.002.patch Fixed checkstyle issue > FairScheduler: RM quits when node removal races with continousscheduling on > the same node > - > > Key: YARN-3675 > URL: https://issues.apache.org/jira/browse/YARN-3675 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3675.001.patch, YARN-3675.002.patch > > > With continuous scheduling, scheduling can be done on a node thats just > removed causing errors like below. > {noformat} > 12:28:53.782 AM FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager > Error in handling event type APP_ATTEMPT_REMOVED to the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.unreserve(FSAppAttempt.java:469) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:815) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:763) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1217) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684) > at java.lang.Thread.run(Thread.java:745) > 12:28:53.783 AMINFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Exiting, bbye.. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs
[ https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552669#comment-14552669 ] Anubhav Dhoot commented on YARN-2005: - Assigning to myself to as I am starting work on this. [~sunilg] let me know if you have made progress on this already. > Blacklisting support for scheduling AMs > --- > > Key: YARN-2005 > URL: https://issues.apache.org/jira/browse/YARN-2005 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 0.23.10, 2.4.0 >Reporter: Jason Lowe > > It would be nice if the RM supported blacklisting a node for an AM launch > after the same node fails a configurable number of AM attempts. This would > be similar to the blacklisting support for scheduling task attempts in the > MapReduce AM but for scheduling AM attempts on the RM side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2005) Blacklisting support for scheduling AMs
[ https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot reassigned YARN-2005: --- Assignee: Anubhav Dhoot > Blacklisting support for scheduling AMs > --- > > Key: YARN-2005 > URL: https://issues.apache.org/jira/browse/YARN-2005 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 0.23.10, 2.4.0 >Reporter: Jason Lowe >Assignee: Anubhav Dhoot > > It would be nice if the RM supported blacklisting a node for an AM launch > after the same node fails a configurable number of AM attempts. This would > be similar to the blacklisting support for scheduling task attempts in the > MapReduce AM but for scheduling AM attempts on the RM side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3647) RMWebServices api's should use updated api from CommonNodeLabelsManager to get NodeLabel object
[ https://issues.apache.org/jira/browse/YARN-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552619#comment-14552619 ] Sunil G commented on YARN-3647: --- Test case failure and findbugs error are not related to this patch. > RMWebServices api's should use updated api from CommonNodeLabelsManager to > get NodeLabel object > --- > > Key: YARN-3647 > URL: https://issues.apache.org/jira/browse/YARN-3647 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-3647.patch, 0002-YARN-3647.patch > > > After YARN-3579, RMWebServices apis can use the updated version of apis in > CommonNodeLabelsManager which gives full NodeLabel object instead of creating > NodeLabel object from plain label name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3677) Fix findbugs warnings in yarn-server-resourcemanager
[ https://issues.apache.org/jira/browse/YARN-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552603#comment-14552603 ] Hudson commented on YARN-3677: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2149 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2149/]) YARN-3677. Fix findbugs warnings in yarn-server-resourcemanager. Contributed by Vinod Kumar Vavilapalli. (ozawa: rev 7401e5b5e8060b6b027d714b5ceb641fcfe5b598) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java > Fix findbugs warnings in yarn-server-resourcemanager > > > Key: YARN-3677 > URL: https://issues.apache.org/jira/browse/YARN-3677 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Akira AJISAKA >Assignee: Vinod Kumar Vavilapalli >Priority: Minor > Labels: newbie > Fix For: 2.7.1 > > Attachments: YARN-3677-20150519.txt > > > There is 1 findbugs warning in FileSystemRMStateStore.java. > {noformat} > Inconsistent synchronization of FileSystemRMStateStore.isHDFS; locked 66% of > time > Unsynchronized access at FileSystemRMStateStore.java: [line 156] > Field > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS > Synchronized 66% of the time > Synchronized access at FileSystemRMStateStore.java: [line 148] > Synchronized access at FileSystemRMStateStore.java: [line 859] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3583) Support of NodeLabel object instead of plain String in YarnClient side.
[ https://issues.apache.org/jira/browse/YARN-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552591#comment-14552591 ] Hudson commented on YARN-3583: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2149 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2149/]) YARN-3583. Support of NodeLabel object instead of plain String in YarnClient side. (Sunil G via wangda) (wangda: rev 563eb1ad2ae848a23bbbf32ebfaf107e8fa14e87) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetNodesToLabelsResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesResponse.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/ReplaceLabelsOnNodeRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetNodesToLabelsResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto > Support of NodeLabel object instead of plain String in YarnClient side. > --- > > Key: YARN-3583 > URL: https://issues.apache.org/jira/browse/YARN-3583 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client >Affects Versions: 2.6.0 >Reporter: Sunil G >Assignee: Sunil G > Fix For: 2.8.0 > > Attachments: 0001-YARN-3583.patch, 0002-YARN-3583.patch, > 0003-YARN-3583.patch, 0004-YARN-3583.patch > > > Similar to YARN-3521, use NodeLabel objects in YarnClient side apis. > getLabelsToNodes/getNodeToLabels api's can use NodeLabel object instead of > using plain label name. > This will help to bring other label details such as Exclusivity to client > side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3302) TestDockerContainerExecutor should run automatically if it can detect docker in the usual place
[ https://issues.apache.org/jira/browse/YARN-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552601#comment-14552601 ] Hudson commented on YARN-3302: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2149 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2149/]) YARN-3302. TestDockerContainerExecutor should run automatically if it can detect docker in the usual place (Ravindra Kumar Naik via raviprak) (raviprak: rev c97f32e7b9d9e1d4c80682cc01741579166174d1) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutor.java * hadoop-yarn-project/CHANGES.txt > TestDockerContainerExecutor should run automatically if it can detect docker > in the usual place > --- > > Key: YARN-3302 > URL: https://issues.apache.org/jira/browse/YARN-3302 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.6.0 >Reporter: Ravi Prakash >Assignee: Ravindra Kumar Naik > Attachments: YARN-3302-trunk.001.patch, YARN-3302-trunk.002.patch, > YARN-3302-trunk.003.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552597#comment-14552597 ] Hudson commented on YARN-2821: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2149 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2149/]) YARN-2821. Fixed a problem that DistributedShell AM may hang if restarted. Contributed by Varun Vasudev (jianhe: rev 7438966586f1896ab3e8b067d47a4af28a894106) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDSAppMaster.java > Distributed shell app master becomes unresponsive sometimes > --- > > Key: YARN-2821 > URL: https://issues.apache.org/jira/browse/YARN-2821 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell >Affects Versions: 2.5.1 >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Fix For: 2.8.0 > > Attachments: YARN-2821.002.patch, YARN-2821.003.patch, > YARN-2821.004.patch, YARN-2821.005.patch, apache-yarn-2821.0.patch, > apache-yarn-2821.1.patch > > > We've noticed that once in a while the distributed shell app master becomes > unresponsive and is eventually killed by the RM. snippet of the logs - > {noformat} > 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: > appattempt_1415123350094_0017_01 received 0 previous attempts' running > containers on AM registration. > 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested > container ask: Capability[]Priority[0] > 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested > container ask: Capability[]Priority[0] > 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested > container ask: Capability[]Priority[0] > 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested > container ask: Capability[]Priority[0] > 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested > container ask: Capability[]Priority[0] > 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : > onprem-tez2:45454 > 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from > RM for container ask, allocatedCnt=1 > 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell > command on a new container., > containerId=container_1415123350094_0017_01_02, > containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, > containerResourceMemory1024, containerResourceVirtualCores1 > 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up > container launch container for > containerid=container_1415123350094_0017_01_02 > 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: > START_CONTAINER for Container container_1415123350094_0017_01_02 > 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : > onprem-tez2:45454 > 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: > QUERY_CONTAINER for Container container_1415123350094_0017_01_02 > 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : > onprem-tez2:45454 > 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : > onprem-tez3:45454 > 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : > onprem-tez4:45454 > 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from > RM for container ask, allocatedCnt=3 > 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell > command on a new container., > containerId=container_1415123350094_0017_01_03, > containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, > containerResourceMemory1024, containerResourceVirtualCores1 > 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell > command on a new container., > containerId=container_1415123350094_0017_01_04, > containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, > containerResourceMemory1024, containerResourceVirtualCores1 > 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell > command on a new container., > containerId=container_1415123350094_0017_01_05, > containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, > containerResourceMemory1024, containerResourceVirtualCores1 > 14/11/04 18:21:39 INFO distrib
[jira] [Commented] (YARN-3565) NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String
[ https://issues.apache.org/jira/browse/YARN-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552596#comment-14552596 ] Hudson commented on YARN-3565: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2149 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2149/]) YARN-3565. NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String. (Naganarasimha G R via wangda) (wangda: rev b37da52a1c4fb3da2bd21bfadc5ec61c5f953a59) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/NodeLabelsProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/NodeLabelTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerRequest.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java > NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object > instead of String > - > > Key: YARN-3565 > URL: https://issues.apache.org/jira/browse/YARN-3565 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Naganarasimha G R >Priority: Blocker > Fix For: 2.8.0 > > Attachments: YARN-3565-20150502-1.patch, YARN-3565.20150515-1.patch, > YARN-3565.20150516-1.patch, YARN-3565.20150519-1.patch > > > Now NM HB/Register uses Set, it will be hard to add new fields if we > want to support specifying NodeLabel type such as exclusivity/constraints, > etc. We need to make sure rolling upgrade works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3601) Fix UT TestRMFailover.testRMWebAppRedirect
[ https://issues.apache.org/jira/browse/YARN-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552592#comment-14552592 ] Hudson commented on YARN-3601: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2149 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2149/]) YARN-3601. Fix UT TestRMFailover.testRMWebAppRedirect. Contributed by Weiwei Yang (xgong: rev 5009ad4a7f712fc578b461ecec53f7f97eaaed0c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java * hadoop-yarn-project/CHANGES.txt > Fix UT TestRMFailover.testRMWebAppRedirect > -- > > Key: YARN-3601 > URL: https://issues.apache.org/jira/browse/YARN-3601 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, webapp > Environment: Red Hat Enterprise Linux Workstation release 6.5 > (Santiago) >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > Labels: test > Fix For: 2.7.1 > > Attachments: YARN-3601.001.patch > > > This test case was not working since the commit from YARN-2605. It failed > with NPE exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3601) Fix UT TestRMFailover.testRMWebAppRedirect
[ https://issues.apache.org/jira/browse/YARN-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552549#comment-14552549 ] Hudson commented on YARN-3601: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #201 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/201/]) YARN-3601. Fix UT TestRMFailover.testRMWebAppRedirect. Contributed by Weiwei Yang (xgong: rev 5009ad4a7f712fc578b461ecec53f7f97eaaed0c) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java > Fix UT TestRMFailover.testRMWebAppRedirect > -- > > Key: YARN-3601 > URL: https://issues.apache.org/jira/browse/YARN-3601 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, webapp > Environment: Red Hat Enterprise Linux Workstation release 6.5 > (Santiago) >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > Labels: test > Fix For: 2.7.1 > > Attachments: YARN-3601.001.patch > > > This test case was not working since the commit from YARN-2605. It failed > with NPE exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552554#comment-14552554 ] Hudson commented on YARN-2821: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #201 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/201/]) YARN-2821. Fixed a problem that DistributedShell AM may hang if restarted. Contributed by Varun Vasudev (jianhe: rev 7438966586f1896ab3e8b067d47a4af28a894106) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDSAppMaster.java * hadoop-yarn-project/CHANGES.txt > Distributed shell app master becomes unresponsive sometimes > --- > > Key: YARN-2821 > URL: https://issues.apache.org/jira/browse/YARN-2821 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell >Affects Versions: 2.5.1 >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Fix For: 2.8.0 > > Attachments: YARN-2821.002.patch, YARN-2821.003.patch, > YARN-2821.004.patch, YARN-2821.005.patch, apache-yarn-2821.0.patch, > apache-yarn-2821.1.patch > > > We've noticed that once in a while the distributed shell app master becomes > unresponsive and is eventually killed by the RM. snippet of the logs - > {noformat} > 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: > appattempt_1415123350094_0017_01 received 0 previous attempts' running > containers on AM registration. > 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested > container ask: Capability[]Priority[0] > 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested > container ask: Capability[]Priority[0] > 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested > container ask: Capability[]Priority[0] > 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested > container ask: Capability[]Priority[0] > 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested > container ask: Capability[]Priority[0] > 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : > onprem-tez2:45454 > 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from > RM for container ask, allocatedCnt=1 > 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell > command on a new container., > containerId=container_1415123350094_0017_01_02, > containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, > containerResourceMemory1024, containerResourceVirtualCores1 > 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up > container launch container for > containerid=container_1415123350094_0017_01_02 > 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: > START_CONTAINER for Container container_1415123350094_0017_01_02 > 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : > onprem-tez2:45454 > 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: > QUERY_CONTAINER for Container container_1415123350094_0017_01_02 > 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : > onprem-tez2:45454 > 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : > onprem-tez3:45454 > 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : > onprem-tez4:45454 > 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from > RM for container ask, allocatedCnt=3 > 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell > command on a new container., > containerId=container_1415123350094_0017_01_03, > containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, > containerResourceMemory1024, containerResourceVirtualCores1 > 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell > command on a new container., > containerId=container_1415123350094_0017_01_04, > containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, > containerResourceMemory1024, containerResourceVirtualCores1 > 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell > command on a new container., > containerId=container_1415123350094_0017_01_05, > containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, > containerResourceMemory1024, containerResourceVirtualCores1 > 14/11/04 18:21:39 IN
[jira] [Commented] (YARN-3677) Fix findbugs warnings in yarn-server-resourcemanager
[ https://issues.apache.org/jira/browse/YARN-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552559#comment-14552559 ] Hudson commented on YARN-3677: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #201 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/201/]) YARN-3677. Fix findbugs warnings in yarn-server-resourcemanager. Contributed by Vinod Kumar Vavilapalli. (ozawa: rev 7401e5b5e8060b6b027d714b5ceb641fcfe5b598) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * hadoop-yarn-project/CHANGES.txt > Fix findbugs warnings in yarn-server-resourcemanager > > > Key: YARN-3677 > URL: https://issues.apache.org/jira/browse/YARN-3677 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Akira AJISAKA >Assignee: Vinod Kumar Vavilapalli >Priority: Minor > Labels: newbie > Fix For: 2.7.1 > > Attachments: YARN-3677-20150519.txt > > > There is 1 findbugs warning in FileSystemRMStateStore.java. > {noformat} > Inconsistent synchronization of FileSystemRMStateStore.isHDFS; locked 66% of > time > Unsynchronized access at FileSystemRMStateStore.java: [line 156] > Field > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS > Synchronized 66% of the time > Synchronized access at FileSystemRMStateStore.java: [line 148] > Synchronized access at FileSystemRMStateStore.java: [line 859] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3583) Support of NodeLabel object instead of plain String in YarnClient side.
[ https://issues.apache.org/jira/browse/YARN-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552548#comment-14552548 ] Hudson commented on YARN-3583: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #201 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/201/]) YARN-3583. Support of NodeLabel object instead of plain String in YarnClient side. (Sunil G via wangda) (wangda: rev 563eb1ad2ae848a23bbbf32ebfaf107e8fa14e87) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetNodesToLabelsResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetNodesToLabelsResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/ReplaceLabelsOnNodeRequestPBImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java > Support of NodeLabel object instead of plain String in YarnClient side. > --- > > Key: YARN-3583 > URL: https://issues.apache.org/jira/browse/YARN-3583 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client >Affects Versions: 2.6.0 >Reporter: Sunil G >Assignee: Sunil G > Fix For: 2.8.0 > > Attachments: 0001-YARN-3583.patch, 0002-YARN-3583.patch, > 0003-YARN-3583.patch, 0004-YARN-3583.patch > > > Similar to YARN-3521, use NodeLabel objects in YarnClient side apis. > getLabelsToNodes/getNodeToLabels api's can use NodeLabel object instead of > using plain label name. > This will help to bring other label details such as Exclusivity to client > side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3565) NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String
[ https://issues.apache.org/jira/browse/YARN-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552553#comment-14552553 ] Hudson commented on YARN-3565: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #201 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/201/]) YARN-3565. NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String. (Naganarasimha G R via wangda) (wangda: rev b37da52a1c4fb3da2bd21bfadc5ec61c5f953a59) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/NodeLabelTestBase.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/NodeLabelsProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java > NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object > instead of String > - > > Key: YARN-3565 > URL: https://issues.apache.org/jira/browse/YARN-3565 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Naganarasimha G R >Priority: Blocker > Fix For: 2.8.0 > > Attachments: YARN-3565-20150502-1.patch, YARN-3565.20150515-1.patch, > YARN-3565.20150516-1.patch, YARN-3565.20150519-1.patch > > > Now NM HB/Register uses Set, it will be hard to add new fields if we > want to support specifying NodeLabel type such as exclusivity/constraints, > etc. We need to make sure rolling upgrade works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3302) TestDockerContainerExecutor should run automatically if it can detect docker in the usual place
[ https://issues.apache.org/jira/browse/YARN-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552557#comment-14552557 ] Hudson commented on YARN-3302: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #201 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/201/]) YARN-3302. TestDockerContainerExecutor should run automatically if it can detect docker in the usual place (Ravindra Kumar Naik via raviprak) (raviprak: rev c97f32e7b9d9e1d4c80682cc01741579166174d1) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutor.java > TestDockerContainerExecutor should run automatically if it can detect docker > in the usual place > --- > > Key: YARN-3302 > URL: https://issues.apache.org/jira/browse/YARN-3302 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.6.0 >Reporter: Ravi Prakash >Assignee: Ravindra Kumar Naik > Attachments: YARN-3302-trunk.001.patch, YARN-3302-trunk.002.patch, > YARN-3302-trunk.003.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers
[ https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552534#comment-14552534 ] Varun Saxena commented on YARN-3051: Well, I am still stuck on trying to get the attribute set via HttpServer2#setAttribute in WebServices class. Will update patch once that is done. > [Storage abstraction] Create backing storage read interface for ATS readers > --- > > Key: YARN-3051 > URL: https://issues.apache.org/jira/browse/YARN-3051 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: YARN-3051.wip.02.YARN-2928.patch, YARN-3051.wip.patch, > YARN-3051_temp.patch > > > Per design in YARN-2928, create backing storage read interface that can be > implemented by multiple backing storage implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3583) Support of NodeLabel object instead of plain String in YarnClient side.
[ https://issues.apache.org/jira/browse/YARN-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552436#comment-14552436 ] Hudson commented on YARN-3583: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #191 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/191/]) YARN-3583. Support of NodeLabel object instead of plain String in YarnClient side. (Sunil G via wangda) (wangda: rev 563eb1ad2ae848a23bbbf32ebfaf107e8fa14e87) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetNodesToLabelsResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/ReplaceLabelsOnNodeRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetNodesToLabelsResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java > Support of NodeLabel object instead of plain String in YarnClient side. > --- > > Key: YARN-3583 > URL: https://issues.apache.org/jira/browse/YARN-3583 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client >Affects Versions: 2.6.0 >Reporter: Sunil G >Assignee: Sunil G > Fix For: 2.8.0 > > Attachments: 0001-YARN-3583.patch, 0002-YARN-3583.patch, > 0003-YARN-3583.patch, 0004-YARN-3583.patch > > > Similar to YARN-3521, use NodeLabel objects in YarnClient side apis. > getLabelsToNodes/getNodeToLabels api's can use NodeLabel object instead of > using plain label name. > This will help to bring other label details such as Exclusivity to client > side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3677) Fix findbugs warnings in yarn-server-resourcemanager
[ https://issues.apache.org/jira/browse/YARN-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552447#comment-14552447 ] Hudson commented on YARN-3677: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #191 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/191/]) YARN-3677. Fix findbugs warnings in yarn-server-resourcemanager. Contributed by Vinod Kumar Vavilapalli. (ozawa: rev 7401e5b5e8060b6b027d714b5ceb641fcfe5b598) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * hadoop-yarn-project/CHANGES.txt > Fix findbugs warnings in yarn-server-resourcemanager > > > Key: YARN-3677 > URL: https://issues.apache.org/jira/browse/YARN-3677 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Akira AJISAKA >Assignee: Vinod Kumar Vavilapalli >Priority: Minor > Labels: newbie > Fix For: 2.7.1 > > Attachments: YARN-3677-20150519.txt > > > There is 1 findbugs warning in FileSystemRMStateStore.java. > {noformat} > Inconsistent synchronization of FileSystemRMStateStore.isHDFS; locked 66% of > time > Unsynchronized access at FileSystemRMStateStore.java: [line 156] > Field > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS > Synchronized 66% of the time > Synchronized access at FileSystemRMStateStore.java: [line 148] > Synchronized access at FileSystemRMStateStore.java: [line 859] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3565) NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String
[ https://issues.apache.org/jira/browse/YARN-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552441#comment-14552441 ] Hudson commented on YARN-3565: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #191 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/191/]) YARN-3565. NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String. (Naganarasimha G R via wangda) (wangda: rev b37da52a1c4fb3da2bd21bfadc5ec61c5f953a59) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/NodeLabelsProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/NodeLabelTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatRequestPBImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto > NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object > instead of String > - > > Key: YARN-3565 > URL: https://issues.apache.org/jira/browse/YARN-3565 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Naganarasimha G R >Priority: Blocker > Fix For: 2.8.0 > > Attachments: YARN-3565-20150502-1.patch, YARN-3565.20150515-1.patch, > YARN-3565.20150516-1.patch, YARN-3565.20150519-1.patch > > > Now NM HB/Register uses Set, it will be hard to add new fields if we > want to support specifying NodeLabel type such as exclusivity/constraints, > etc. We need to make sure rolling upgrade works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3344) procfs stat file is not in the expected format warning
[ https://issues.apache.org/jira/browse/YARN-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552431#comment-14552431 ] Hadoop QA commented on YARN-3344: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 2s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 41s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 1s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 24s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 23s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 57s | Tests passed in hadoop-yarn-common. | | | | 39m 2s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734126/YARN-3344-trunk.005.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 4aa730c | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8024/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8024/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8024/console | This message was automatically generated. > procfs stat file is not in the expected format warning > -- > > Key: YARN-3344 > URL: https://issues.apache.org/jira/browse/YARN-3344 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Jon Bringhurst >Assignee: Ravindra Kumar Naik > Attachments: YARN-3344-trunk.005.patch > > > Although this doesn't appear to be causing any functional issues, it is > spamming our log files quite a bit. :) > It appears that the regex in ProcfsBasedProcessTree doesn't work for all > /proc//stat files. > Here's the error I'm seeing: > {noformat} > "source_host": "asdf", > "method": "constructProcessInfo", > "level": "WARN", > "message": "Unexpected: procfs stat file is not in the expected format > for process with pid 6953" > "file": "ProcfsBasedProcessTree.java", > "line_number": "514", > "class": "org.apache.hadoop.yarn.util.ProcfsBasedProcessTree", > {noformat} > And here's the basic info on process with pid 6953: > {noformat} > [asdf ~]$ cat /proc/6953/stat > 6953 (python2.6 /expo) S 1871 1871 1871 0 -1 4202496 9364 1080 0 0 25 3 0 0 > 20 0 1 0 144918696 205295616 5856 18446744073709551615 1 1 0 0 0 0 0 16781312 > 2 18446744073709551615 0 0 17 13 0 0 0 0 0 > [asdf ~]$ ps aux|grep 6953 > root 6953 0.0 0.0 200484 23424 ?S21:44 0:00 python2.6 > /export/apps/salt/minion-scripts/module-sync.py > jbringhu 13481 0.0 0.0 105312 872 pts/0S+ 22:13 0:00 grep -i 6953 > [asdf ~]$ > {noformat} > This is using 2.6.32-431.11.2.el6.x86_64 in RHEL 6.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3302) TestDockerContainerExecutor should run automatically if it can detect docker in the usual place
[ https://issues.apache.org/jira/browse/YARN-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552445#comment-14552445 ] Hudson commented on YARN-3302: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #191 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/191/]) YARN-3302. TestDockerContainerExecutor should run automatically if it can detect docker in the usual place (Ravindra Kumar Naik via raviprak) (raviprak: rev c97f32e7b9d9e1d4c80682cc01741579166174d1) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutor.java * hadoop-yarn-project/CHANGES.txt > TestDockerContainerExecutor should run automatically if it can detect docker > in the usual place > --- > > Key: YARN-3302 > URL: https://issues.apache.org/jira/browse/YARN-3302 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.6.0 >Reporter: Ravi Prakash >Assignee: Ravindra Kumar Naik > Attachments: YARN-3302-trunk.001.patch, YARN-3302-trunk.002.patch, > YARN-3302-trunk.003.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3601) Fix UT TestRMFailover.testRMWebAppRedirect
[ https://issues.apache.org/jira/browse/YARN-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552437#comment-14552437 ] Hudson commented on YARN-3601: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #191 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/191/]) YARN-3601. Fix UT TestRMFailover.testRMWebAppRedirect. Contributed by Weiwei Yang (xgong: rev 5009ad4a7f712fc578b461ecec53f7f97eaaed0c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java * hadoop-yarn-project/CHANGES.txt > Fix UT TestRMFailover.testRMWebAppRedirect > -- > > Key: YARN-3601 > URL: https://issues.apache.org/jira/browse/YARN-3601 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, webapp > Environment: Red Hat Enterprise Linux Workstation release 6.5 > (Santiago) >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > Labels: test > Fix For: 2.7.1 > > Attachments: YARN-3601.001.patch > > > This test case was not working since the commit from YARN-2605. It failed > with NPE exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552442#comment-14552442 ] Hudson commented on YARN-2821: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #191 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/191/]) YARN-2821. Fixed a problem that DistributedShell AM may hang if restarted. Contributed by Varun Vasudev (jianhe: rev 7438966586f1896ab3e8b067d47a4af28a894106) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDSAppMaster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java > Distributed shell app master becomes unresponsive sometimes > --- > > Key: YARN-2821 > URL: https://issues.apache.org/jira/browse/YARN-2821 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell >Affects Versions: 2.5.1 >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Fix For: 2.8.0 > > Attachments: YARN-2821.002.patch, YARN-2821.003.patch, > YARN-2821.004.patch, YARN-2821.005.patch, apache-yarn-2821.0.patch, > apache-yarn-2821.1.patch > > > We've noticed that once in a while the distributed shell app master becomes > unresponsive and is eventually killed by the RM. snippet of the logs - > {noformat} > 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: > appattempt_1415123350094_0017_01 received 0 previous attempts' running > containers on AM registration. > 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested > container ask: Capability[]Priority[0] > 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested > container ask: Capability[]Priority[0] > 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested > container ask: Capability[]Priority[0] > 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested > container ask: Capability[]Priority[0] > 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested > container ask: Capability[]Priority[0] > 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : > onprem-tez2:45454 > 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from > RM for container ask, allocatedCnt=1 > 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell > command on a new container., > containerId=container_1415123350094_0017_01_02, > containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, > containerResourceMemory1024, containerResourceVirtualCores1 > 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up > container launch container for > containerid=container_1415123350094_0017_01_02 > 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: > START_CONTAINER for Container container_1415123350094_0017_01_02 > 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : > onprem-tez2:45454 > 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: > QUERY_CONTAINER for Container container_1415123350094_0017_01_02 > 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : > onprem-tez2:45454 > 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : > onprem-tez3:45454 > 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : > onprem-tez4:45454 > 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from > RM for container ask, allocatedCnt=3 > 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell > command on a new container., > containerId=container_1415123350094_0017_01_03, > containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, > containerResourceMemory1024, containerResourceVirtualCores1 > 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell > command on a new container., > containerId=container_1415123350094_0017_01_04, > containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, > containerResourceMemory1024, containerResourceVirtualCores1 > 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell > command on a new container., > containerId=container_1415123350094_0017_01_05, > containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, > containerResourceMemory1024, containerResourceVirtualCores1 > 14/11/04 18:21:39 INFO distrib
[jira] [Commented] (YARN-314) Schedulers should allow resource requests of different sizes at the same priority and location
[ https://issues.apache.org/jira/browse/YARN-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552416#comment-14552416 ] Karthik Kambatla commented on YARN-314: --- Discussed this with [~asuresh] offline. We were wondering if AppSchedulingInfo should be supplemented (or replaced) by another singleton data structure that captures pending requests and maintains multiple maps - to index these requests by both apps and nodes/racks. We should of course add other convenience methods to add/remove or query these requests. > Schedulers should allow resource requests of different sizes at the same > priority and location > -- > > Key: YARN-314 > URL: https://issues.apache.org/jira/browse/YARN-314 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Affects Versions: 2.0.2-alpha >Reporter: Sandy Ryza > Attachments: yarn-314-prelim.patch > > > Currently, resource requests for the same container and locality are expected > to all be the same size. > While it it doesn't look like it's needed for apps currently, and can be > circumvented by specifying different priorities if absolutely necessary, it > seems to me that the ability to request containers with different resource > requirements at the same priority level should be there for the future and > for completeness sake. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552404#comment-14552404 ] Hadoop QA commented on YARN-3646: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 34s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 38s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 6s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 6m 51s | Tests passed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 1m 55s | Tests passed in hadoop-yarn-common. | | | | 45m 47s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734115/YARN-3646.002.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | trunk / 4aa730c | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8023/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8023/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8023/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8023/console | This message was automatically generated. > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti > Attachments: YARN-3646.001.patch, YARN-3646.002.patch, YARN-3646.patch > > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.servic
[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552392#comment-14552392 ] Hudson commented on YARN-2821: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2131 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2131/]) YARN-2821. Fixed a problem that DistributedShell AM may hang if restarted. Contributed by Varun Vasudev (jianhe: rev 7438966586f1896ab3e8b067d47a4af28a894106) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDSAppMaster.java > Distributed shell app master becomes unresponsive sometimes > --- > > Key: YARN-2821 > URL: https://issues.apache.org/jira/browse/YARN-2821 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell >Affects Versions: 2.5.1 >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Fix For: 2.8.0 > > Attachments: YARN-2821.002.patch, YARN-2821.003.patch, > YARN-2821.004.patch, YARN-2821.005.patch, apache-yarn-2821.0.patch, > apache-yarn-2821.1.patch > > > We've noticed that once in a while the distributed shell app master becomes > unresponsive and is eventually killed by the RM. snippet of the logs - > {noformat} > 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: > appattempt_1415123350094_0017_01 received 0 previous attempts' running > containers on AM registration. > 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested > container ask: Capability[]Priority[0] > 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested > container ask: Capability[]Priority[0] > 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested > container ask: Capability[]Priority[0] > 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested > container ask: Capability[]Priority[0] > 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested > container ask: Capability[]Priority[0] > 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : > onprem-tez2:45454 > 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from > RM for container ask, allocatedCnt=1 > 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell > command on a new container., > containerId=container_1415123350094_0017_01_02, > containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, > containerResourceMemory1024, containerResourceVirtualCores1 > 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up > container launch container for > containerid=container_1415123350094_0017_01_02 > 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: > START_CONTAINER for Container container_1415123350094_0017_01_02 > 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : > onprem-tez2:45454 > 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: > QUERY_CONTAINER for Container container_1415123350094_0017_01_02 > 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : > onprem-tez2:45454 > 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : > onprem-tez3:45454 > 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : > onprem-tez4:45454 > 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from > RM for container ask, allocatedCnt=3 > 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell > command on a new container., > containerId=container_1415123350094_0017_01_03, > containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, > containerResourceMemory1024, containerResourceVirtualCores1 > 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell > command on a new container., > containerId=container_1415123350094_0017_01_04, > containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, > containerResourceMemory1024, containerResourceVirtualCores1 > 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell > command on a new container., > containerId=container_1415123350094_0017_01_05, > containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, > containerResourceMemory1024, containerResourceVirtualCores1 > 14/11/04 18:21:39 INFO distributedshell.
[jira] [Commented] (YARN-3583) Support of NodeLabel object instead of plain String in YarnClient side.
[ https://issues.apache.org/jira/browse/YARN-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552386#comment-14552386 ] Hudson commented on YARN-3583: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2131 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2131/]) YARN-3583. Support of NodeLabel object instead of plain String in YarnClient side. (Sunil G via wangda) (wangda: rev 563eb1ad2ae848a23bbbf32ebfaf107e8fa14e87) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/ReplaceLabelsOnNodeRequestPBImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetNodesToLabelsResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetNodesToLabelsResponse.java > Support of NodeLabel object instead of plain String in YarnClient side. > --- > > Key: YARN-3583 > URL: https://issues.apache.org/jira/browse/YARN-3583 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client >Affects Versions: 2.6.0 >Reporter: Sunil G >Assignee: Sunil G > Fix For: 2.8.0 > > Attachments: 0001-YARN-3583.patch, 0002-YARN-3583.patch, > 0003-YARN-3583.patch, 0004-YARN-3583.patch > > > Similar to YARN-3521, use NodeLabel objects in YarnClient side apis. > getLabelsToNodes/getNodeToLabels api's can use NodeLabel object instead of > using plain label name. > This will help to bring other label details such as Exclusivity to client > side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3302) TestDockerContainerExecutor should run automatically if it can detect docker in the usual place
[ https://issues.apache.org/jira/browse/YARN-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552395#comment-14552395 ] Hudson commented on YARN-3302: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2131 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2131/]) YARN-3302. TestDockerContainerExecutor should run automatically if it can detect docker in the usual place (Ravindra Kumar Naik via raviprak) (raviprak: rev c97f32e7b9d9e1d4c80682cc01741579166174d1) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutor.java * hadoop-yarn-project/CHANGES.txt > TestDockerContainerExecutor should run automatically if it can detect docker > in the usual place > --- > > Key: YARN-3302 > URL: https://issues.apache.org/jira/browse/YARN-3302 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.6.0 >Reporter: Ravi Prakash >Assignee: Ravindra Kumar Naik > Attachments: YARN-3302-trunk.001.patch, YARN-3302-trunk.002.patch, > YARN-3302-trunk.003.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3677) Fix findbugs warnings in yarn-server-resourcemanager
[ https://issues.apache.org/jira/browse/YARN-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552397#comment-14552397 ] Hudson commented on YARN-3677: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2131 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2131/]) YARN-3677. Fix findbugs warnings in yarn-server-resourcemanager. Contributed by Vinod Kumar Vavilapalli. (ozawa: rev 7401e5b5e8060b6b027d714b5ceb641fcfe5b598) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * hadoop-yarn-project/CHANGES.txt > Fix findbugs warnings in yarn-server-resourcemanager > > > Key: YARN-3677 > URL: https://issues.apache.org/jira/browse/YARN-3677 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Akira AJISAKA >Assignee: Vinod Kumar Vavilapalli >Priority: Minor > Labels: newbie > Fix For: 2.7.1 > > Attachments: YARN-3677-20150519.txt > > > There is 1 findbugs warning in FileSystemRMStateStore.java. > {noformat} > Inconsistent synchronization of FileSystemRMStateStore.isHDFS; locked 66% of > time > Unsynchronized access at FileSystemRMStateStore.java: [line 156] > Field > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS > Synchronized 66% of the time > Synchronized access at FileSystemRMStateStore.java: [line 148] > Synchronized access at FileSystemRMStateStore.java: [line 859] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3565) NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String
[ https://issues.apache.org/jira/browse/YARN-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552391#comment-14552391 ] Hudson commented on YARN-3565: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2131 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2131/]) YARN-3565. NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String. (Naganarasimha G R via wangda) (wangda: rev b37da52a1c4fb3da2bd21bfadc5ec61c5f953a59) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/NodeLabelTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/NodeLabelsProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerRequest.java > NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object > instead of String > - > > Key: YARN-3565 > URL: https://issues.apache.org/jira/browse/YARN-3565 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Naganarasimha G R >Priority: Blocker > Fix For: 2.8.0 > > Attachments: YARN-3565-20150502-1.patch, YARN-3565.20150515-1.patch, > YARN-3565.20150516-1.patch, YARN-3565.20150519-1.patch > > > Now NM HB/Register uses Set, it will be hard to add new fields if we > want to support specifying NodeLabel type such as exclusivity/constraints, > etc. We need to make sure rolling upgrade works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3601) Fix UT TestRMFailover.testRMWebAppRedirect
[ https://issues.apache.org/jira/browse/YARN-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552387#comment-14552387 ] Hudson commented on YARN-3601: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2131 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2131/]) YARN-3601. Fix UT TestRMFailover.testRMWebAppRedirect. Contributed by Weiwei Yang (xgong: rev 5009ad4a7f712fc578b461ecec53f7f97eaaed0c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java * hadoop-yarn-project/CHANGES.txt > Fix UT TestRMFailover.testRMWebAppRedirect > -- > > Key: YARN-3601 > URL: https://issues.apache.org/jira/browse/YARN-3601 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, webapp > Environment: Red Hat Enterprise Linux Workstation release 6.5 > (Santiago) >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > Labels: test > Fix For: 2.7.1 > > Attachments: YARN-3601.001.patch > > > This test case was not working since the commit from YARN-2605. It failed > with NPE exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability
[ https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552375#comment-14552375 ] MENG DING commented on YARN-1902: - I have been experimenting with the idea of changing AppSchedulingInfo to maintain a total request table, a fulfilled allocation table, and then calculate the difference of the two tables as the real outstanding request table used for scheduling. All is fine until I realized that this cannot handle one use case where a AMRMClient, right before sending the allocation heartbeat, removes all container requests, and add new container requests at the same priority and location (possibly with different resource capability). AppSchedulingInfo does not know about this, and may not treat the newly added container requests as outstanding requests. I agree that currently I do not see a clean solution without affecting backward compatibility. > Allocation of too many containers when a second request is done with the same > resource capability > - > > Key: YARN-1902 > URL: https://issues.apache.org/jira/browse/YARN-1902 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Affects Versions: 2.2.0, 2.3.0, 2.4.0 >Reporter: Sietse T. Au >Assignee: Sietse T. Au > Labels: client > Attachments: YARN-1902.patch, YARN-1902.v2.patch, YARN-1902.v3.patch > > > Regarding AMRMClientImpl > Scenario 1: > Given a ContainerRequest x with Resource y, when addContainerRequest is > called z times with x, allocate is called and at least one of the z allocated > containers is started, then if another addContainerRequest call is done and > subsequently an allocate call to the RM, (z+1) containers will be allocated, > where 1 container is expected. > Scenario 2: > No containers are started between the allocate calls. > Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) > are requested in both scenarios, but that only in the second scenario, the > correct behavior is observed. > Looking at the implementation I have found that this (z+1) request is caused > by the structure of the remoteRequestsTable. The consequence of Map ResourceRequestInfo> is that ResourceRequestInfo does not hold any > information about whether a request has been sent to the RM yet or not. > There are workarounds for this, such as releasing the excess containers > received. > The solution implemented is to initialize a new ResourceRequest in > ResourceRequestInfo when a request has been successfully sent to the RM. > The patch includes a test in which scenario one is tested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)