[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state
[ https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597265#comment-14597265 ] Varun Saxena commented on YARN-2902: Looking at the public localization code, I do not think public resources can be orphaned because we do not stop localization for them midway on container cleanup. Its difficult to ascertain though from logs as to why localization was failing in the scenario mentioned above for public resources. Whatever little I could look into the code, I could not find anything concrete which can explain the failures. Anyways, the scope of this JIRA, i.e. orphaning of resources would not happen for PUBLIC resources IMHO. And I guess there is no point further delaying this JIRA hoping to find out what went wrong with public resources in scenario above. bq. What's not clear to me is whether the trigger was the public localization timing out or the stopContainer request Reference can become 0 if container is killed while downloading. Coming to the patch, there are two approaches to handle this. # Cleanup for downloading resources can be done by Localization Service while doing container cleanup. # On Heartbeat from container localizer, if localizer runner is already stopped, we can indicate the localizer runner to do the cleanup for downloading resources. The patch attached adopts approach 1. Herein, we wait for container localizer to die before running deletion tasks. Also, downloading resources can either be in local directory or in local directory suffixed by {{_tmp}}. So we try for both. Moreover, localization failed event is sent to all the containers which are referring to the resource which is in downloading state. > Killing a container that is localizing can orphan resources in the > DOWNLOADING state > > > Key: YARN-2902 > URL: https://issues.apache.org/jira/browse/YARN-2902 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: Jason Lowe >Assignee: Varun Saxena > Attachments: YARN-2902.002.patch, YARN-2902.03.patch, YARN-2902.patch > > > If a container is in the process of localizing when it is stopped/killed then > resources are left in the DOWNLOADING state. If no other container comes > along and requests these resources they linger around with no reference > counts but aren't cleaned up during normal cache cleanup scans since it will > never delete resources in the DOWNLOADING state even if their reference count > is zero. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3705) forcemanual transitionToStandby in RM-HA automatic-failover mode should change elector state
[ https://issues.apache.org/jira/browse/YARN-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597255#comment-14597255 ] Masatake Iwasaki commented on YARN-3705: The failure of TestRMRestart seems to be same issue with YARN-2871. > forcemanual transitionToStandby in RM-HA automatic-failover mode should > change elector state > > > Key: YARN-3705 > URL: https://issues.apache.org/jira/browse/YARN-3705 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki > Attachments: YARN-3705.001.patch, YARN-3705.002.patch, > YARN-3705.003.patch > > > Executing {{rmadmin -transitionToStandby --forcemanual}} in > automatic-failover.enabled mode makes ResouceManager standby while keeping > the state of ActiveStandbyElector. It should make elector to quit and rejoin > in order to enable other candidates to promote, otherwise forcemanual > transition should not be allowed in automatic-failover mode in order to avoid > confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3705) forcemanual transitionToStandby in RM-HA automatic-failover mode should change elector state
[ https://issues.apache.org/jira/browse/YARN-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597229#comment-14597229 ] Hadoop QA commented on YARN-3705: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 35s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 42s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 13s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 17s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 7m 0s | Tests passed in hadoop-yarn-client. | | {color:red}-1{color} | yarn tests | 50m 41s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 97m 29s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart | | | hadoop.yarn.server.resourcemanager.TestRMRestart | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741212/YARN-3705.003.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | trunk / 99271b7 | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8323/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8323/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8323/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8323/console | This message was automatically generated. > forcemanual transitionToStandby in RM-HA automatic-failover mode should > change elector state > > > Key: YARN-3705 > URL: https://issues.apache.org/jira/browse/YARN-3705 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki > Attachments: YARN-3705.001.patch, YARN-3705.002.patch, > YARN-3705.003.patch > > > Executing {{rmadmin -transitionToStandby --forcemanual}} in > automatic-failover.enabled mode makes ResouceManager standby while keeping > the state of ActiveStandbyElector. It should make elector to quit and rejoin > in order to enable other candidates to promote, otherwise forcemanual > transition should not be allowed in automatic-failover mode in order to avoid > confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3800) Simplify inmemory state for ReservationAllocation
[ https://issues.apache.org/jira/browse/YARN-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597189#comment-14597189 ] Hadoop QA commented on YARN-3800: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 20m 5s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 7 new or modified test files. | | {color:green}+1{color} | javac | 10m 39s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 12m 6s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 28s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 0s | The applied patch generated 1 new checkstyle issues (total was 54, now 49). | | {color:green}+1{color} | whitespace | 0m 5s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 58s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 37s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 35s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 45m 33s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 94m 11s | | \\ \\ || Reason || Tests || | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer | | | org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart | | | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | | | org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741211/YARN-3800.003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 99271b7 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8321/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8321/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8321/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8321/console | This message was automatically generated. > Simplify inmemory state for ReservationAllocation > - > > Key: YARN-3800 > URL: https://issues.apache.org/jira/browse/YARN-3800 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3800.001.patch, YARN-3800.002.patch, > YARN-3800.002.patch, YARN-3800.003.patch > > > Instead of storing the ReservationRequest we store the Resource for > allocations, as thats the only thing we need. Ultimately we convert > everything to resources anyway -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3838) Rest API failing when ip configured in RM address in secure https mode
[ https://issues.apache.org/jira/browse/YARN-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597187#comment-14597187 ] Hadoop QA commented on YARN-3838: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 52s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 33s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 41s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 34s | The applied patch generated 1 new checkstyle issues (total was 39, now 40). | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 24s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 22m 2s | Tests passed in hadoop-common. | | {color:green}+1{color} | yarn tests | 1m 56s | Tests passed in hadoop-yarn-common. | | | | 66m 50s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12740917/0001-YARN-3838.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 99271b7 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8322/artifact/patchprocess/diffcheckstylehadoop-common.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8322/artifact/patchprocess/whitespace.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8322/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8322/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8322/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8322/console | This message was automatically generated. > Rest API failing when ip configured in RM address in secure https mode > -- > > Key: YARN-3838 > URL: https://issues.apache.org/jira/browse/YARN-3838 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-HADOOP-12096.patch, 0001-YARN-3810.patch, > 0001-YARN-3838.patch, 0002-YARN-3810.patch > > > Steps to reproduce > === > 1.Configure hadoop.http.authentication.kerberos.principal as below > {code:xml} > > hadoop.http.authentication.kerberos.principal > HTTP/_h...@hadoop.com > > {code} > 2. In RM web address also configure IP > 3. Startup RM > Call Rest API for RM {{ curl -i -k --insecure --negotiate -u : https IP > /ws/v1/cluster/info"}} > *Actual* > Rest API failing > {code} > 2015-06-16 19:03:49,845 DEBUG > org.apache.hadoop.security.authentication.server.AuthenticationFilter: > Authentication exception: GSSException: No valid credentials provided > (Mechanism level: Failed to find any Kerberos credentails) > org.apache.hadoop.security.authentication.client.AuthenticationException: > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos credentails) > at > org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:399) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:348) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:519) > at > org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter.doFilter(RMAuthenticationFilter.java:82) > {code} -- This message was
[jira] [Commented] (YARN-3705) forcemanual transitionToStandby in RM-HA automatic-failover mode should change elector state
[ https://issues.apache.org/jira/browse/YARN-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597176#comment-14597176 ] Masatake Iwasaki commented on YARN-3705: bq. ResourceManager#handleTransitionToStandBy is expected to be used only when automatic failover enabled. This was not true. It checks not {{isAutomaticFailoverEnabled}} but {{isHAEnabled}}. {{ResourceManager#handleTransitionToStandBy}} is no-op if {{RMContext#isHAEnabled}} is false. {code} public void handleTransitionToStandBy() { if (rmContext.isHAEnabled()) { try { // Transition to standby and reinit active services LOG.info("Transitioning RM to Standby mode"); transitionToStandby(true); adminService.resetLeaderElection(); return; } catch (Exception e) { LOG.fatal("Failed to transition RM to Standby mode."); ExitUtil.terminate(1, e); } } } {code} It seems strange that doing nothing in transitionToStandby if {{isHAEnable}} is false affects tests for HA... > forcemanual transitionToStandby in RM-HA automatic-failover mode should > change elector state > > > Key: YARN-3705 > URL: https://issues.apache.org/jira/browse/YARN-3705 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki > Attachments: YARN-3705.001.patch, YARN-3705.002.patch, > YARN-3705.003.patch > > > Executing {{rmadmin -transitionToStandby --forcemanual}} in > automatic-failover.enabled mode makes ResouceManager standby while keeping > the state of ActiveStandbyElector. It should make elector to quit and rejoin > in order to enable other candidates to promote, otherwise forcemanual > transition should not be allowed in automatic-failover mode in order to avoid > confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3841) [Storage implementation] Create HDFS backing storage implementation for ATS writes
[ https://issues.apache.org/jira/browse/YARN-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-3841: - Summary: [Storage implementation] Create HDFS backing storage implementation for ATS writes (was: [Storage abstraction] Create HDFS backing storage implementation for ATS writes) > [Storage implementation] Create HDFS backing storage implementation for ATS > writes > -- > > Key: YARN-3841 > URL: https://issues.apache.org/jira/browse/YARN-3841 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Tsuyoshi Ozawa >Assignee: Tsuyoshi Ozawa > > HDFS backing storage is useful for following scenarios. > 1. For Hadoop clusters which don't run HBase. > 2. For fallback from HBase when HBase cluster is temporary unavailable. > Quoting ATS design document of YARN-2928: > {quote} > In the case the HBase > storage is not available, the plugin should buffer the writes temporarily > (e.g. HDFS), and flush > them once the storage comes back online. Reading and writing to hdfs as the > the backup storage > could potentially use the HDFS writer plugin unless the complexity of > generalizing the HDFS > writer plugin for this purpose exceeds the benefits of reusing it here. > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3705) forcemanual transitionToStandby in RM-HA automatic-failover mode should change elector state
[ https://issues.apache.org/jira/browse/YARN-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated YARN-3705: --- Attachment: YARN-3705.003.patch The test failure is relevant. ResourceManager#handleTransitionToStandBy is expected to be used only when automatic failover enabled. I am attaching 003 addressing non automatic failover case too. > forcemanual transitionToStandby in RM-HA automatic-failover mode should > change elector state > > > Key: YARN-3705 > URL: https://issues.apache.org/jira/browse/YARN-3705 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki > Attachments: YARN-3705.001.patch, YARN-3705.002.patch, > YARN-3705.003.patch > > > Executing {{rmadmin -transitionToStandby --forcemanual}} in > automatic-failover.enabled mode makes ResouceManager standby while keeping > the state of ActiveStandbyElector. It should make elector to quit and rejoin > in order to enable other candidates to promote, otherwise forcemanual > transition should not be allowed in automatic-failover mode in order to avoid > confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3800) Simplify inmemory state for ReservationAllocation
[ https://issues.apache.org/jira/browse/YARN-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3800: Attachment: YARN-3800.003.patch fixed checkstyle > Simplify inmemory state for ReservationAllocation > - > > Key: YARN-3800 > URL: https://issues.apache.org/jira/browse/YARN-3800 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3800.001.patch, YARN-3800.002.patch, > YARN-3800.002.patch, YARN-3800.003.patch > > > Instead of storing the ReservationRequest we store the Resource for > allocations, as thats the only thing we need. Ultimately we convert > everything to resources anyway -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED
[ https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596997#comment-14596997 ] Tsuyoshi Ozawa commented on YARN-3798: -- After zk server closes the client, zk client in ZKRMStateStore will accept CONNECTIONLOSS and handle it without creating new session. > ZKRMStateStore shouldn't create new session without occurrance of > SESSIONEXPIED > --- > > Key: YARN-3798 > URL: https://issues.apache.org/jira/browse/YARN-3798 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 > Environment: Suse 11 Sp3 >Reporter: Bibin A Chundatt >Assignee: Varun Saxena >Priority: Blocker > Attachments: RM.log, YARN-3798-2.7.002.patch, > YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.patch > > > RM going down with NoNode exception during create of znode for appattempt > *Please find the exception logs* > {code} > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-09 10:09:44,886 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Exception while executing a ZK operation. > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode > at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) > at java.lang.Thread.run(Thread.java:745) > 2015-06-09 10:09:44,887 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed > out ZK retries. Giving up! > 2015-06-09 10:09:44,887 ERROR > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error > updating appAttempt: appattempt_1433764310492_7152_01 > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode > at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) > at > org.apache.hadoop.yarn.server.resourcemana
[jira] [Commented] (YARN-3792) Test case failures in TestDistributedShell and some issue fixes related to ATSV2
[ https://issues.apache.org/jira/browse/YARN-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596994#comment-14596994 ] Hadoop QA commented on YARN-3792: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 17m 29s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 49s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 4s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 26s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 38s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 43s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 5m 59s | The patch appears to introduce 7 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 8m 10s | Tests passed in hadoop-yarn-applications-distributedshell. | | {color:green}+1{color} | yarn tests | 1m 58s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 6m 11s | Tests passed in hadoop-yarn-server-nodemanager. | | {color:red}-1{color} | yarn tests | 51m 49s | Tests failed in hadoop-yarn-server-resourcemanager. | | {color:green}+1{color} | yarn tests | 1m 17s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 115m 20s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | | Failed unit tests | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741171/YARN-3792-YARN-2928.004.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 8c036a1 | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8319/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/8319/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8319/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8319/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8319/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8319/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8319/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8319/console | This message was automatically generated. > Test case failures in TestDistributedShell and some issue fixes related to > ATSV2 > > > Key: YARN-3792 > URL: https://issues.apache.org/jira/browse/YARN-3792 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: YARN-3792-YARN-2928.001.patch, > YARN-3792-YARN-2928.002.patch, YARN-3792-YARN-2928.003.patch, > YARN-3792-YARN-2928.004.patch > > > # encountered [testcase > failures|https://builds.apache.org/job/PreCommit-YARN-Build/8233/testReport/] > which was happening even without the patch modifications in YARN-3044 > TestDistributedShell.testDSShellWithoutDomainV2CustomizedFlow > TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow > TestDistributedShellWithNodeLabels.testDSShellWithNodeLabelExpression > # Remove unused {{enableATSV1}} in testDisstributedShell > # container metrics needs to be published only for v2 test cases of > testDisstributedShell > # Nullpointer was
[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED
[ https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596990#comment-14596990 ] Tsuyoshi Ozawa commented on YARN-3798: -- [~vinodkv] the patch is only applied to branch-2.7 because ZKRMStateStrore of 2.8 or later uses Apache Curator. I'm running test locally under hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager, so I'll report the result manually. Double checking is welcome. > ZKRMStateStore shouldn't create new session without occurrance of > SESSIONEXPIED > --- > > Key: YARN-3798 > URL: https://issues.apache.org/jira/browse/YARN-3798 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 > Environment: Suse 11 Sp3 >Reporter: Bibin A Chundatt >Assignee: Varun Saxena >Priority: Blocker > Attachments: RM.log, YARN-3798-2.7.002.patch, > YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.patch > > > RM going down with NoNode exception during create of znode for appattempt > *Please find the exception logs* > {code} > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-09 10:09:44,886 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Exception while executing a ZK operation. > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode > at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) > at java.lang.Thread.run(Thread.java:745) > 2015-06-09 10:09:44,887 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed > out ZK retries. Giving up! > 2015-06-09 10:09:44,887 ERROR > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error > updating appAttempt: appattempt_1433764310492_7152_01 > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode > at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) > at org.apache.zookeeper.ZooKeepe
[jira] [Commented] (YARN-3705) forcemanual transitionToStandby in RM-HA automatic-failover mode should change elector state
[ https://issues.apache.org/jira/browse/YARN-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596989#comment-14596989 ] Hadoop QA commented on YARN-3705: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 55s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 41s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 41s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 15s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 18s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 5m 41s | Tests failed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 50m 55s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 96m 54s | | \\ \\ || Reason || Tests || | Timed out tests | org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741178/YARN-3705.002.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | trunk / fac4e04 | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8320/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8320/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8320/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8320/console | This message was automatically generated. > forcemanual transitionToStandby in RM-HA automatic-failover mode should > change elector state > > > Key: YARN-3705 > URL: https://issues.apache.org/jira/browse/YARN-3705 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki > Attachments: YARN-3705.001.patch, YARN-3705.002.patch > > > Executing {{rmadmin -transitionToStandby --forcemanual}} in > automatic-failover.enabled mode makes ResouceManager standby while keeping > the state of ActiveStandbyElector. It should make elector to quit and rejoin > in order to enable other candidates to promote, otherwise forcemanual > transition should not be allowed in automatic-failover mode in order to avoid > confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED
[ https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596987#comment-14596987 ] Tsuyoshi Ozawa commented on YARN-3798: -- [~zxu] In the case of SessionMovedException, I think zk client should retry to connect to another zk server with same session id automatically without creating new session. If we create new session for SessionMovedException, we'll face the same issue as Bibin and Varun reported. With new patch, SessionMovedException is handled in same session. After we get SessionMovedException, the zk client in ZKRMStateStore waits for passing specified period and retrying operations. At that time, zk server should detect the session has moved and close the client as a document for ZooKeeper mentions: http://zookeeper.apache.org/doc/r3.4.0/zookeeperProgrammers.html#ch_zkSessions {quote} When the delayed packet arrives at the first server, the old server detects that the session has moved, and closes the client connection. {quote} If this behaviour is not same as described, we should fix ZooKeeper. > ZKRMStateStore shouldn't create new session without occurrance of > SESSIONEXPIED > --- > > Key: YARN-3798 > URL: https://issues.apache.org/jira/browse/YARN-3798 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 > Environment: Suse 11 Sp3 >Reporter: Bibin A Chundatt >Assignee: Varun Saxena >Priority: Blocker > Attachments: RM.log, YARN-3798-2.7.002.patch, > YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.patch > > > RM going down with NoNode exception during create of znode for appattempt > *Please find the exception logs* > {code} > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-09 10:09:44,886 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Exception while executing a ZK operation. > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode > at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175) > at > org.apache.hadoop.yarn.event.As
[jira] [Commented] (YARN-3835) hadoop-yarn-server-resourcemanager test package bundles core-site.xml, yarn-site.xml
[ https://issues.apache.org/jira/browse/YARN-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596963#comment-14596963 ] Hudson commented on YARN-3835: -- FAILURE: Integrated in Hadoop-trunk-Commit #8051 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8051/]) YARN-3835. hadoop-yarn-server-resourcemanager test package bundles core-site.xml, yarn-site.xml (vamsee via rkanter) (rkanter: rev 99271b762129d78c86f3c9733a24c77962b0b3f7) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml > hadoop-yarn-server-resourcemanager test package bundles core-site.xml, > yarn-site.xml > > > Key: YARN-3835 > URL: https://issues.apache.org/jira/browse/YARN-3835 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Vamsee Yarlagadda >Assignee: Vamsee Yarlagadda >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-3835.patch > > > It looks like by default yarn is bundling core-site.xml, yarn-site.xml in > test artifact of hadoop-yarn-server-resourcemanager which means that any > downstream project which uses this a dependency can have a problem in picking > up the user supplied/environment supplied core-site.xml, yarn-site.xml > So we should ideally exclude these .xml files from being bundled into the > test-jar. (Similar to YARN-1748) > I also proactively looked at other YARN modules where this might be > happening. > {code} > vamsee-MBP:hadoop-yarn-project vamsee$ find . -name "*-site.xml" > ./hadoop-yarn/conf/yarn-site.xml > ./hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/resources/yarn-site.xml > ./hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/test/resources/yarn-site.xml > ./hadoop-yarn/hadoop-yarn-client/src/test/resources/core-site.xml > ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/core-site.xml > ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/core-site.xml > ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site.xml > ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/target/test-classes/core-site.xml > ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/target/test-classes/yarn-site.xml > ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/resources/core-site.xml > {code} > And out of these only two modules (hadoop-yarn-server-resourcemanager, > hadoop-yarn-server-tests) are building test-jars. In future, if we start > building test-jar of other modules, we should exclude these xml files from > being bundled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3800) Simplify inmemory state for ReservationAllocation
[ https://issues.apache.org/jira/browse/YARN-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596957#comment-14596957 ] Hadoop QA commented on YARN-3800: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 6s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 7 new or modified test files. | | {color:green}+1{color} | javac | 7m 37s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 47s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 50s | The applied patch generated 7 new checkstyle issues (total was 55, now 56). | | {color:green}+1{color} | whitespace | 0m 4s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 37s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 27s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 51m 0s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 89m 29s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741165/YARN-3800.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / fac4e04 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8318/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8318/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8318/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8318/console | This message was automatically generated. > Simplify inmemory state for ReservationAllocation > - > > Key: YARN-3800 > URL: https://issues.apache.org/jira/browse/YARN-3800 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3800.001.patch, YARN-3800.002.patch, > YARN-3800.002.patch > > > Instead of storing the ReservationRequest we store the Resource for > allocations, as thats the only thing we need. Ultimately we convert > everything to resources anyway -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3842) NMProxy should retry on NMNotYetReadyException
[ https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596946#comment-14596946 ] Hudson commented on YARN-3842: -- FAILURE: Integrated in Hadoop-trunk-Commit #8050 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8050/]) YARN-3842. NMProxy should retry on NMNotYetReadyException. (Robert Kanter via kasha) (kasha: rev 5ebf2817e58e1be8214dc1916a694a912075aa0a) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestNMProxy.java * hadoop-yarn-project/CHANGES.txt > NMProxy should retry on NMNotYetReadyException > -- > > Key: YARN-3842 > URL: https://issues.apache.org/jira/browse/YARN-3842 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: Robert Kanter >Priority: Critical > Fix For: 2.7.1 > > Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, > YARN-3842.001.patch, YARN-3842.002.patch > > > Consider the following scenario: > 1. RM assigns a container on node N to an app A. > 2. Node N is restarted > 3. A tries to launch container on node N. > 3 could lead to an NMNotYetReadyException depending on whether NM N has > registered with the RM. In MR, this is considered a task attempt failure. A > few of these could lead to a task/job failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3748) Cleanup Findbugs volatile warnings
[ https://issues.apache.org/jira/browse/YARN-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596936#comment-14596936 ] Gabor Liptak commented on YARN-3748: Any other changes needed before this can be considered for commit? Thanks > Cleanup Findbugs volatile warnings > -- > > Key: YARN-3748 > URL: https://issues.apache.org/jira/browse/YARN-3748 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Gabor Liptak >Priority: Minor > Attachments: YARN-3748.1.patch, YARN-3748.2.patch, YARN-3748.3.patch, > YARN-3748.5.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3842) NMProxy should retry on NMNotYetReadyException
[ https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3842: --- Summary: NMProxy should retry on NMNotYetReadyException (was: NM restarts could lead to app failures) > NMProxy should retry on NMNotYetReadyException > -- > > Key: YARN-3842 > URL: https://issues.apache.org/jira/browse/YARN-3842 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: Robert Kanter >Priority: Critical > Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, > YARN-3842.001.patch, YARN-3842.002.patch > > > Consider the following scenario: > 1. RM assigns a container on node N to an app A. > 2. Node N is restarted > 3. A tries to launch container on node N. > 3 could lead to an NMNotYetReadyException depending on whether NM N has > registered with the RM. In MR, this is considered a task attempt failure. A > few of these could lead to a task/job failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3792) Test case failures in TestDistributedShell and some issue fixes related to ATSV2
[ https://issues.apache.org/jira/browse/YARN-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596925#comment-14596925 ] Sangjin Lee commented on YARN-3792: --- The latest patch LGTM. Once the jenkins comes back, I'll go ahead and merge it. Folks, do let me know soon if you have any other feedback. Thanks! > Test case failures in TestDistributedShell and some issue fixes related to > ATSV2 > > > Key: YARN-3792 > URL: https://issues.apache.org/jira/browse/YARN-3792 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: YARN-3792-YARN-2928.001.patch, > YARN-3792-YARN-2928.002.patch, YARN-3792-YARN-2928.003.patch, > YARN-3792-YARN-2928.004.patch > > > # encountered [testcase > failures|https://builds.apache.org/job/PreCommit-YARN-Build/8233/testReport/] > which was happening even without the patch modifications in YARN-3044 > TestDistributedShell.testDSShellWithoutDomainV2CustomizedFlow > TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow > TestDistributedShellWithNodeLabels.testDSShellWithNodeLabelExpression > # Remove unused {{enableATSV1}} in testDisstributedShell > # container metrics needs to be published only for v2 test cases of > testDisstributedShell > # Nullpointer was thrown in TimelineClientImpl.constructResURI when Aux > service was not configured and {{TimelineClient.putObjects}} was getting > invoked. > # Race condition for the Application events to published and test case > verification for RM's ApplicationFinished Timeline Events > # Application Tags for converted to lowercase in > ApplicationSubmissionContextPBimpl, hence RMTimelinecollector was not able to > detect to custom flow details of the app -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2801) Documentation development for Node labels requirment
[ https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596910#comment-14596910 ] Naganarasimha G R commented on YARN-2801: - Hi [~leftnoteasy], After escaping the links, seems like its getting applied. Few nits : * ??User need configure how many resources?? => {{User need configure how much resource of each partition}} * points in note after configuration section needs to come as list * ??application can use following Java APIs?? => ??Application can use following Java APIs?? Apart from it others seems to be fine ! > Documentation development for Node labels requirment > > > Key: YARN-2801 > URL: https://issues.apache.org/jira/browse/YARN-2801 > Project: Hadoop YARN > Issue Type: Sub-task > Components: documentation >Reporter: Gururaj Shetty >Assignee: Wangda Tan > Attachments: YARN-2801.1.patch, YARN-2801.2.patch > > > Documentation needs to be developed for the node label requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler
[ https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596894#comment-14596894 ] Hadoop QA commented on YARN-3635: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 25s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 7m 55s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 0s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 46s | The applied patch generated 18 new checkstyle issues (total was 204, now 215). | | {color:red}-1{color} | whitespace | 0m 3s | The patch has 15 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 40s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 1m 30s | The patch appears to introduce 3 new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 50m 14s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 89m 34s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | | Failed unit tests | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741145/YARN-3635.5.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 077250d | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8316/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8316/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8316/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8316/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8316/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8316/console | This message was automatically generated. > Get-queue-mapping should be a common interface of YarnScheduler > --- > > Key: YARN-3635 > URL: https://issues.apache.org/jira/browse/YARN-3635 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-3635.1.patch, YARN-3635.2.patch, YARN-3635.3.patch, > YARN-3635.4.patch, YARN-3635.5.patch > > > Currently, both of fair/capacity scheduler support queue mapping, which makes > scheduler can change queue of an application after submitted to scheduler. > One issue of doing this in specific scheduler is: If the queue after mapping > has different maximum_allocation/default-node-label-expression of the > original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks > the wrong queue. > I propose to make the queue mapping as a common interface of scheduler, and > RMAppManager set the queue after mapping before doing validations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3705) forcemanual transitionToStandby in RM-HA automatic-failover mode should change elector state
[ https://issues.apache.org/jira/browse/YARN-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated YARN-3705: --- Attachment: YARN-3705.002.patch I'm attached 002 addressing whitespace warnings. TestWorkPreservingRMRestart is not related to the code path the patch fixes. > forcemanual transitionToStandby in RM-HA automatic-failover mode should > change elector state > > > Key: YARN-3705 > URL: https://issues.apache.org/jira/browse/YARN-3705 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki > Attachments: YARN-3705.001.patch, YARN-3705.002.patch > > > Executing {{rmadmin -transitionToStandby --forcemanual}} in > automatic-failover.enabled mode makes ResouceManager standby while keeping > the state of ActiveStandbyElector. It should make elector to quit and rejoin > in order to enable other candidates to promote, otherwise forcemanual > transition should not be allowed in automatic-failover mode in order to avoid > confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2801) Documentation development for Node labels requirment
[ https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596886#comment-14596886 ] Naganarasimha G R commented on YARN-2801: - hi [~leftnoteasy], seems like after applying the patch mvn site is failing > Documentation development for Node labels requirment > > > Key: YARN-2801 > URL: https://issues.apache.org/jira/browse/YARN-2801 > Project: Hadoop YARN > Issue Type: Sub-task > Components: documentation >Reporter: Gururaj Shetty >Assignee: Wangda Tan > Attachments: YARN-2801.1.patch, YARN-2801.2.patch > > > Documentation needs to be developed for the node label requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2801) Documentation development for Node labels requirment
[ https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-2801: Assignee: Wangda Tan (was: Naganarasimha G R) > Documentation development for Node labels requirment > > > Key: YARN-2801 > URL: https://issues.apache.org/jira/browse/YARN-2801 > Project: Hadoop YARN > Issue Type: Sub-task > Components: documentation >Reporter: Gururaj Shetty >Assignee: Wangda Tan > Attachments: YARN-2801.1.patch, YARN-2801.2.patch > > > Documentation needs to be developed for the node label requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2801) Documentation development for Node labels requirment
[ https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R reassigned YARN-2801: --- Assignee: Naganarasimha G R (was: Wangda Tan) > Documentation development for Node labels requirment > > > Key: YARN-2801 > URL: https://issues.apache.org/jira/browse/YARN-2801 > Project: Hadoop YARN > Issue Type: Sub-task > Components: documentation >Reporter: Gururaj Shetty >Assignee: Naganarasimha G R > Attachments: YARN-2801.1.patch, YARN-2801.2.patch > > > Documentation needs to be developed for the node label requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3792) Test case failures in TestDistributedShell and some issue fixes related to ATSV2
[ https://issues.apache.org/jira/browse/YARN-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3792: Attachment: YARN-3792-YARN-2928.004.patch Hi [~sjlee0], Corrected whitespace and findbugs issue in Client.java attaching a patch for it, and the remaining seems to be not a problemif not unnecessary checks needs to be done. > Test case failures in TestDistributedShell and some issue fixes related to > ATSV2 > > > Key: YARN-3792 > URL: https://issues.apache.org/jira/browse/YARN-3792 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: YARN-3792-YARN-2928.001.patch, > YARN-3792-YARN-2928.002.patch, YARN-3792-YARN-2928.003.patch, > YARN-3792-YARN-2928.004.patch > > > # encountered [testcase > failures|https://builds.apache.org/job/PreCommit-YARN-Build/8233/testReport/] > which was happening even without the patch modifications in YARN-3044 > TestDistributedShell.testDSShellWithoutDomainV2CustomizedFlow > TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow > TestDistributedShellWithNodeLabels.testDSShellWithNodeLabelExpression > # Remove unused {{enableATSV1}} in testDisstributedShell > # container metrics needs to be published only for v2 test cases of > testDisstributedShell > # Nullpointer was thrown in TimelineClientImpl.constructResURI when Aux > service was not configured and {{TimelineClient.putObjects}} was getting > invoked. > # Race condition for the Application events to published and test case > verification for RM's ApplicationFinished Timeline Events > # Application Tags for converted to lowercase in > ApplicationSubmissionContextPBimpl, hence RMTimelinecollector was not able to > detect to custom flow details of the app -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3842) NM restarts could lead to app failures
[ https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596867#comment-14596867 ] Hadoop QA commented on YARN-3842: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 16s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 38s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 28s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 47s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 56s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 6m 27s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 49m 43s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741154/YARN-3842.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 077250d | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8317/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8317/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8317/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8317/console | This message was automatically generated. > NM restarts could lead to app failures > -- > > Key: YARN-3842 > URL: https://issues.apache.org/jira/browse/YARN-3842 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: Robert Kanter >Priority: Critical > Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, > YARN-3842.001.patch, YARN-3842.002.patch > > > Consider the following scenario: > 1. RM assigns a container on node N to an app A. > 2. Node N is restarted > 3. A tries to launch container on node N. > 3 could lead to an NMNotYetReadyException depending on whether NM N has > registered with the RM. In MR, this is considered a task attempt failure. A > few of these could lead to a task/job failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3843) Fair Scheduler should not accept apps with space keys as queue name
[ https://issues.apache.org/jira/browse/YARN-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongwook Kwon resolved YARN-3843. - Resolution: Duplicate Fix Version/s: 2.8.0 Target Version/s: 2.8.0 > Fair Scheduler should not accept apps with space keys as queue name > --- > > Key: YARN-3843 > URL: https://issues.apache.org/jira/browse/YARN-3843 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.4.0, 2.5.0 >Reporter: Dongwook Kwon >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-3843.01.patch > > > As YARN-461, since empty string queue name is not valid, queue name with > space keys such as " " ," " should not be accepted either, also not as > prefix nor postfix. > e.g) "root.test.queuename ", or "root.test. queuename" > I have 2 specific cases kill RM with these space keys as part of queue name. > 1) Without placement policy (hadoop 2.4.0 and above), > When a job is submitted with " "(space key) as queue name > e.g) mapreduce.job.queuename=" " > 2) With placement policy (hadoop 2.5.0 and above) > Once a job is submitted without space key as queue name, and submit another > job with space key. > e.g) 1st time: mapreduce.job.queuename="root.test.user1" > 2nd time: mapreduce.job.queuename="root.test.user1 " > {code} > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.974 sec <<< > FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler > testQueueNameWithSpace(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler) > Time elapsed: 0.724 sec <<< ERROR! > org.apache.hadoop.metrics2.MetricsException: Metrics source > QueueMetrics,q0=root,q1=adhoc,q2=birvine already exists! > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135) > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112) > at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:218) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.(FSQueue.java:56) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.(FSLeafQueue.java:66) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createQueue(QueueManager.java:169) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getQueue(QueueManager.java:120) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:88) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:660) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1127) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testQueueNameWithSpace(TestFairScheduler.java:627) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3843) Fair Scheduler should not accept apps with space keys as queue name
[ https://issues.apache.org/jira/browse/YARN-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596858#comment-14596858 ] Dongwook Kwon commented on YARN-3843: - Thanks, you're right, it's duplicated. I didn't find the other jira case, I will close it. > Fair Scheduler should not accept apps with space keys as queue name > --- > > Key: YARN-3843 > URL: https://issues.apache.org/jira/browse/YARN-3843 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.4.0, 2.5.0 >Reporter: Dongwook Kwon >Priority: Minor > Attachments: YARN-3843.01.patch > > > As YARN-461, since empty string queue name is not valid, queue name with > space keys such as " " ," " should not be accepted either, also not as > prefix nor postfix. > e.g) "root.test.queuename ", or "root.test. queuename" > I have 2 specific cases kill RM with these space keys as part of queue name. > 1) Without placement policy (hadoop 2.4.0 and above), > When a job is submitted with " "(space key) as queue name > e.g) mapreduce.job.queuename=" " > 2) With placement policy (hadoop 2.5.0 and above) > Once a job is submitted without space key as queue name, and submit another > job with space key. > e.g) 1st time: mapreduce.job.queuename="root.test.user1" > 2nd time: mapreduce.job.queuename="root.test.user1 " > {code} > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.974 sec <<< > FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler > testQueueNameWithSpace(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler) > Time elapsed: 0.724 sec <<< ERROR! > org.apache.hadoop.metrics2.MetricsException: Metrics source > QueueMetrics,q0=root,q1=adhoc,q2=birvine already exists! > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135) > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112) > at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:218) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.(FSQueue.java:56) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.(FSLeafQueue.java:66) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createQueue(QueueManager.java:169) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getQueue(QueueManager.java:120) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:88) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:660) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1127) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testQueueNameWithSpace(TestFairScheduler.java:627) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3843) Fair Scheduler should not accept apps with space keys as queue name
[ https://issues.apache.org/jira/browse/YARN-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongwook Kwon updated YARN-3843: Attachment: YARN-3843.01.patch > Fair Scheduler should not accept apps with space keys as queue name > --- > > Key: YARN-3843 > URL: https://issues.apache.org/jira/browse/YARN-3843 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.4.0, 2.5.0 >Reporter: Dongwook Kwon >Priority: Minor > Attachments: YARN-3843.01.patch > > > As YARN-461, since empty string queue name is not valid, queue name with > space keys such as " " ," " should not be accepted either, also not as > prefix nor postfix. > e.g) "root.test.queuename ", or "root.test. queuename" > I have 2 specific cases kill RM with these space keys as part of queue name. > 1) Without placement policy (hadoop 2.4.0 and above), > When a job is submitted with " "(space key) as queue name > e.g) mapreduce.job.queuename=" " > 2) With placement policy (hadoop 2.5.0 and above) > Once a job is submitted without space key as queue name, and submit another > job with space key. > e.g) 1st time: mapreduce.job.queuename="root.test.user1" > 2nd time: mapreduce.job.queuename="root.test.user1 " > {code} > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.974 sec <<< > FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler > testQueueNameWithSpace(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler) > Time elapsed: 0.724 sec <<< ERROR! > org.apache.hadoop.metrics2.MetricsException: Metrics source > QueueMetrics,q0=root,q1=adhoc,q2=birvine already exists! > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135) > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112) > at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:218) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.(FSQueue.java:56) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.(FSLeafQueue.java:66) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createQueue(QueueManager.java:169) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getQueue(QueueManager.java:120) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:88) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:660) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1127) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testQueueNameWithSpace(TestFairScheduler.java:627) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3843) Fair Scheduler should not accept apps with space keys as queue name
[ https://issues.apache.org/jira/browse/YARN-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596848#comment-14596848 ] zhihai xu commented on YARN-3843: - Hi [~dongwook], thanks for reporting this issue. I think this issue was fixed at YARN-3241. > Fair Scheduler should not accept apps with space keys as queue name > --- > > Key: YARN-3843 > URL: https://issues.apache.org/jira/browse/YARN-3843 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.4.0, 2.5.0 >Reporter: Dongwook Kwon >Priority: Minor > > As YARN-461, since empty string queue name is not valid, queue name with > space keys such as " " ," " should not be accepted either, also not as > prefix nor postfix. > e.g) "root.test.queuename ", or "root.test. queuename" > I have 2 specific cases kill RM with these space keys as part of queue name. > 1) Without placement policy (hadoop 2.4.0 and above), > When a job is submitted with " "(space key) as queue name > e.g) mapreduce.job.queuename=" " > 2) With placement policy (hadoop 2.5.0 and above) > Once a job is submitted without space key as queue name, and submit another > job with space key. > e.g) 1st time: mapreduce.job.queuename="root.test.user1" > 2nd time: mapreduce.job.queuename="root.test.user1 " > {code} > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.974 sec <<< > FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler > testQueueNameWithSpace(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler) > Time elapsed: 0.724 sec <<< ERROR! > org.apache.hadoop.metrics2.MetricsException: Metrics source > QueueMetrics,q0=root,q1=adhoc,q2=birvine already exists! > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135) > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112) > at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:218) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.(FSQueue.java:56) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.(FSLeafQueue.java:66) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createQueue(QueueManager.java:169) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getQueue(QueueManager.java:120) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:88) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:660) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1127) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testQueueNameWithSpace(TestFairScheduler.java:627) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3843) Fair Scheduler should not accept apps with space keys as queue name
[ https://issues.apache.org/jira/browse/YARN-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596842#comment-14596842 ] Dongwook Kwon commented on YARN-3843: - >From my investigation, QueueMetrics doesn't allow space key string as start or >end of names, it just trims empty strings. static final Splitter Q_SPLITTER = Splitter.on('.').omitEmptyStrings().trimResults(); https://github.com/apache/hadoop/blob/branch-2.5.2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java#L112 https://github.com/apache/hadoop/blob/branch-2.5.2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java#L85 So, from FairScheduler, "root.adhoc.birvine ", this queue name with the space at the end of name, it is treated as different from "root.adhoc.birvine" because it has one more character, and from QueueMetrics, because names are trimmed, all of sudden, 2 different queue names become the same that causes the error as "Metrics source QueueMetrics,q0=root,q1=adhoc,q2=birvine already exists!" > Fair Scheduler should not accept apps with space keys as queue name > --- > > Key: YARN-3843 > URL: https://issues.apache.org/jira/browse/YARN-3843 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.4.0, 2.5.0 >Reporter: Dongwook Kwon >Priority: Minor > > As YARN-461, since empty string queue name is not valid, queue name with > space keys such as " " ," " should not be accepted either, also not as > prefix nor postfix. > e.g) "root.test.queuename ", or "root.test. queuename" > I have 2 specific cases kill RM with these space keys as part of queue name. > 1) Without placement policy (hadoop 2.4.0 and above), > When a job is submitted with " "(space key) as queue name > e.g) mapreduce.job.queuename=" " > 2) With placement policy (hadoop 2.5.0 and above) > Once a job is submitted without space key as queue name, and submit another > job with space key. > e.g) 1st time: mapreduce.job.queuename="root.test.user1" > 2nd time: mapreduce.job.queuename="root.test.user1 " > {code} > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.974 sec <<< > FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler > testQueueNameWithSpace(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler) > Time elapsed: 0.724 sec <<< ERROR! > org.apache.hadoop.metrics2.MetricsException: Metrics source > QueueMetrics,q0=root,q1=adhoc,q2=birvine already exists! > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135) > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112) > at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:218) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.(FSQueue.java:56) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.(FSLeafQueue.java:66) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createQueue(QueueManager.java:169) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getQueue(QueueManager.java:120) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:88) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:660) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1127) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testQueueNameWithSpace(TestFairScheduler.java:627) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3800) Simplify inmemory state for ReservationAllocation
[ https://issues.apache.org/jira/browse/YARN-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3800: Attachment: YARN-3800.002.patch Addressed feedback > Simplify inmemory state for ReservationAllocation > - > > Key: YARN-3800 > URL: https://issues.apache.org/jira/browse/YARN-3800 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3800.001.patch, YARN-3800.002.patch, > YARN-3800.002.patch > > > Instead of storing the ReservationRequest we store the Resource for > allocations, as thats the only thing we need. Ultimately we convert > everything to resources anyway -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3843) Fair Scheduler should not accept apps with space keys as queue name
Dongwook Kwon created YARN-3843: --- Summary: Fair Scheduler should not accept apps with space keys as queue name Key: YARN-3843 URL: https://issues.apache.org/jira/browse/YARN-3843 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.5.0, 2.4.0 Reporter: Dongwook Kwon Priority: Minor As YARN-461, since empty string queue name is not valid, queue name with space keys such as " " ," " should not be accepted either, also not as prefix nor postfix. e.g) "root.test.queuename ", or "root.test. queuename" I have 2 specific cases kill RM with these space keys as part of queue name. 1) Without placement policy (hadoop 2.4.0 and above), When a job is submitted with " "(space key) as queue name e.g) mapreduce.job.queuename=" " 2) With placement policy (hadoop 2.5.0 and above) Once a job is submitted without space key as queue name, and submit another job with space key. e.g) 1st time: mapreduce.job.queuename="root.test.user1" 2nd time: mapreduce.job.queuename="root.test.user1 " {code} Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.974 sec <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler testQueueNameWithSpace(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler) Time elapsed: 0.724 sec <<< ERROR! org.apache.hadoop.metrics2.MetricsException: Metrics source QueueMetrics,q0=root,q1=adhoc,q2=birvine already exists! at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:218) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.(FSQueue.java:56) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.(FSLeafQueue.java:66) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createQueue(QueueManager.java:169) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getQueue(QueueManager.java:120) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:88) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:660) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:569) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1127) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testQueueNameWithSpace(TestFairScheduler.java:627) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2862) RM might not start if the machine was hard shutdown and FileSystemRMStateStore was used
[ https://issues.apache.org/jira/browse/YARN-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596837#comment-14596837 ] Ming Ma commented on YARN-2862: --- Thanks, [~rohithsharma] and [~leftnoteasy]. Yes, YARN-3410 will be useful. So admins still need to look through RM logs to identify those apps. Will it be useful to provide a new RM startup option to delete or skip such apps automatically? > RM might not start if the machine was hard shutdown and > FileSystemRMStateStore was used > --- > > Key: YARN-2862 > URL: https://issues.apache.org/jira/browse/YARN-2862 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ming Ma > > This might be a known issue. Given FileSystemRMStateStore isn't used for HA > scenario, it might not be that important, unless there is something we need > to fix at RM layer to make it more tolerant to RMStore issue. > When RM was hard shutdown, OS might not get a chance to persist blocks. Some > of the stored application data end up with size zero after reboot. And RM > didn't like that. > {noformat} > ls -al > /var/log/hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1412702189634_324351 > total 156 > drwxr-xr-x.2 x y 4096 Nov 13 16:45 . > drwxr-xr-x. 1524 x y 151552 Nov 13 16:45 .. > -rw-r--r--.1 x y 0 Nov 13 16:45 > appattempt_1412702189634_324351_01 > -rw-r--r--.1 x y 0 Nov 13 16:45 > .appattempt_1412702189634_324351_01.crc > -rw-r--r--.1 x y 0 Nov 13 16:45 application_1412702189634_324351 > -rw-r--r--.1 x y 0 Nov 13 16:45 .application_1412702189634_324351.crc > {noformat} > When RM starts up > {noformat} > 2014-11-13 16:55:25,844 WARN org.apache.hadoop.fs.FSInputChecker: Problem > opening checksum file: > file:/var/log/hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1412702189634_324351/application_1412702189634_324351. > Ignoring exception: > java.io.EOFException > at java.io.DataInputStream.readFully(DataInputStream.java:197) > at java.io.DataInputStream.readFully(DataInputStream.java:169) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146) > at > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:792) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.readFile(FileSystemRMStateStore.java:501) > ... > 2014-11-13 17:40:48,876 ERROR > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to > load/recover state > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ApplicationState.getAppId(RMStateStore.java:184) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:306) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:425) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1027) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:484) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:834) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3842) NM restarts could lead to app failures
[ https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596783#comment-14596783 ] Karthik Kambatla commented on YARN-3842: +1, pending Jenkins. Thanks for your review, [~jianhe]. I ll go ahead commit this if Jenkins is fine with it. > NM restarts could lead to app failures > -- > > Key: YARN-3842 > URL: https://issues.apache.org/jira/browse/YARN-3842 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: Robert Kanter >Priority: Critical > Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, > YARN-3842.001.patch, YARN-3842.002.patch > > > Consider the following scenario: > 1. RM assigns a container on node N to an app A. > 2. Node N is restarted > 3. A tries to launch container on node N. > 3 could lead to an NMNotYetReadyException depending on whether NM N has > registered with the RM. In MR, this is considered a task attempt failure. A > few of these could lead to a task/job failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3842) NM restarts could lead to app failures
[ https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596776#comment-14596776 ] Jian He commented on YARN-3842: --- I think the latest patch is safe for 2.7.1, +1 > NM restarts could lead to app failures > -- > > Key: YARN-3842 > URL: https://issues.apache.org/jira/browse/YARN-3842 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: Robert Kanter >Priority: Critical > Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, > YARN-3842.001.patch, YARN-3842.002.patch > > > Consider the following scenario: > 1. RM assigns a container on node N to an app A. > 2. Node N is restarted > 3. A tries to launch container on node N. > 3 could lead to an NMNotYetReadyException depending on whether NM N has > registered with the RM. In MR, this is considered a task attempt failure. A > few of these could lead to a task/job failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3842) NM restarts could lead to app failures
[ https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-3842: Attachment: YARN-3842.002.patch The new patch make the changes Karthik suggested. I also added a few comments and renamed {{isExpectingNMNotYetReadyException}} to {{shouldThrowNMNotYetReadyException}} for clarity. > NM restarts could lead to app failures > -- > > Key: YARN-3842 > URL: https://issues.apache.org/jira/browse/YARN-3842 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: Robert Kanter >Priority: Critical > Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, > YARN-3842.001.patch, YARN-3842.002.patch > > > Consider the following scenario: > 1. RM assigns a container on node N to an app A. > 2. Node N is restarted > 3. A tries to launch container on node N. > 3 could lead to an NMNotYetReadyException depending on whether NM N has > registered with the RM. In MR, this is considered a task attempt failure. A > few of these could lead to a task/job failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3842) NM restarts could lead to app failures
[ https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596765#comment-14596765 ] Robert Kanter commented on YARN-3842: - I had sort of just split {{startContainers}} into two sections (one for each part of the test), but this is a lot more concise. I'll do that. > NM restarts could lead to app failures > -- > > Key: YARN-3842 > URL: https://issues.apache.org/jira/browse/YARN-3842 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: Robert Kanter >Priority: Critical > Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, > YARN-3842.001.patch > > > Consider the following scenario: > 1. RM assigns a container on node N to an app A. > 2. Node N is restarted > 3. A tries to launch container on node N. > 3 could lead to an NMNotYetReadyException depending on whether NM N has > registered with the RM. In MR, this is considered a task attempt failure. A > few of these could lead to a task/job failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3842) NM restarts could lead to app failures
[ https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596756#comment-14596756 ] Karthik Kambatla commented on YARN-3842: Thanks for the quick turnaround on this, Robert. One nit-pick on the test: would the following be more concise? {code} if (retryCount < 5) { retryCount++; if (isExpectingNMNotYetReadyException) { containerManager.setBlockNewContainerRequests(true); } else { throw new java.net.ConnectException("start container exception"); } } else { containerManager.setBlockNewContainerRequests(false); } return super.startContainers(requests); {code} > NM restarts could lead to app failures > -- > > Key: YARN-3842 > URL: https://issues.apache.org/jira/browse/YARN-3842 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: Robert Kanter >Priority: Critical > Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, > YARN-3842.001.patch > > > Consider the following scenario: > 1. RM assigns a container on node N to an app A. > 2. Node N is restarted > 3. A tries to launch container on node N. > 3 could lead to an NMNotYetReadyException depending on whether NM N has > registered with the RM. In MR, this is considered a task attempt failure. A > few of these could lead to a task/job failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler
[ https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3635: - Attachment: YARN-3635.5.patch Attached ver.5, fixed bunch of warnings. > Get-queue-mapping should be a common interface of YarnScheduler > --- > > Key: YARN-3635 > URL: https://issues.apache.org/jira/browse/YARN-3635 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-3635.1.patch, YARN-3635.2.patch, YARN-3635.3.patch, > YARN-3635.4.patch, YARN-3635.5.patch > > > Currently, both of fair/capacity scheduler support queue mapping, which makes > scheduler can change queue of an application after submitted to scheduler. > One issue of doing this in specific scheduler is: If the queue after mapping > has different maximum_allocation/default-node-label-expression of the > original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks > the wrong queue. > I propose to make the queue mapping as a common interface of scheduler, and > RMAppManager set the queue after mapping before doing validations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3842) NM restarts could lead to app failures
[ https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596733#comment-14596733 ] Hadoop QA commented on YARN-3842: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 20s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 46s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 42s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 30s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 47s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 58s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 6m 5s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 49m 40s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741131/YARN-3842.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 445b132 | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8314/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8314/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8314/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8314/console | This message was automatically generated. > NM restarts could lead to app failures > -- > > Key: YARN-3842 > URL: https://issues.apache.org/jira/browse/YARN-3842 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: Robert Kanter >Priority: Critical > Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, > YARN-3842.001.patch > > > Consider the following scenario: > 1. RM assigns a container on node N to an app A. > 2. Node N is restarted > 3. A tries to launch container on node N. > 3 could lead to an NMNotYetReadyException depending on whether NM N has > registered with the RM. In MR, this is considered a task attempt failure. A > few of these could lead to a task/job failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3792) Test case failures in TestDistributedShell and some issue fixes related to ATSV2
[ https://issues.apache.org/jira/browse/YARN-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596730#comment-14596730 ] Sangjin Lee commented on YARN-3792: --- Thanks [~Naganarasimha] for the update! +1 on the test failure. It appears to be an issue unrelated to the timeline service. It does seem like the whitespace is related to the patch (or in the vicinity of the patch). Could you kindly do a quick change to remove those extra spaces? Also, for findbugs, I ran findbugs against those two projects (distributed shell and resource manager). I do see several findbugs warnings, and they are not introduced by this patch but do appear to be related to the YARN-2928 work. distributed shell: {code} {code} resource manager: {code} {code} It would be nice to address them (at least the one on Client.java) here, but if you're not inclined, we could do it later... Let me know how you want to proceed. > Test case failures in TestDistributedShell and some issue fixes related to > ATSV2 > > > Key: YARN-3792 > URL: https://issues.apache.org/jira/browse/YARN-3792 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: YARN-3792-YARN-2928.001.patch, > YARN-3792-YARN-2928.002.patch, YARN-3792-YARN-2928.003.patch > > > # encountered [testcase > failures|https://builds.apache.org/job/PreCommit-YARN-Build/8233/testReport/] > which was happening even without the patch modifications in YARN-3044 > TestDistributedShell.testDSShellWithoutDomainV2CustomizedFlow > TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow > TestDistributedShellWithNodeLabels.testDSShellWithNodeLabelExpression > # Remove unused {{enableATSV1}} in testDisstributedShell > # container metrics needs to be published only for v2 test cases of > testDisstributedShell > # Nullpointer was thrown in TimelineClientImpl.constructResURI when Aux > service was not configured and {{TimelineClient.putObjects}} was getting > invoked. > # Race condition for the Application events to published and test case > verification for RM's ApplicationFinished Timeline Events > # Application Tags for converted to lowercase in > ApplicationSubmissionContextPBimpl, hence RMTimelinecollector was not able to > detect to custom flow details of the app -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3360) Add JMX metrics to TimelineDataManager
[ https://issues.apache.org/jira/browse/YARN-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596717#comment-14596717 ] Jason Lowe commented on YARN-3360: -- The checkstyle comments are complaining about existing method argument lengths or the visibility of the Metrics fields. I was replicating the same style used by all other metric fields, so this is consistent with the code base. > Add JMX metrics to TimelineDataManager > -- > > Key: YARN-3360 > URL: https://issues.apache.org/jira/browse/YARN-3360 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Jason Lowe > Labels: BB2015-05-TBR > Attachments: YARN-3360.001.patch, YARN-3360.002.patch, > YARN-3360.003.patch > > > The TimelineDataManager currently has no metrics, outside of the standard JVM > metrics. It would be very useful to at least log basic counts of method > calls, time spent in those calls, and number of entities/events involved. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2801) Documentation development for Node labels requirment
[ https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596706#comment-14596706 ] Hadoop QA commented on YARN-2801: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 3m 2s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | release audit | 0m 21s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | site | 1m 58s | Site compilation is broken. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | | | 5m 26s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741138/YARN-2801.2.patch | | Optional Tests | site | | git revision | trunk / 11ac848 | | site | https://builds.apache.org/job/PreCommit-YARN-Build/8315/artifact/patchprocess/patchSiteWarnings.txt | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8315/console | This message was automatically generated. > Documentation development for Node labels requirment > > > Key: YARN-2801 > URL: https://issues.apache.org/jira/browse/YARN-2801 > Project: Hadoop YARN > Issue Type: Sub-task > Components: documentation >Reporter: Gururaj Shetty >Assignee: Wangda Tan > Attachments: YARN-2801.1.patch, YARN-2801.2.patch > > > Documentation needs to be developed for the node label requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2801) Documentation development for Node labels requirment
[ https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2801: - Attachment: YARN-2801.2.patch Hi [~Naganarasimha], Thanks for your thoughtful review, comments for your suggestions: 2) There's no preemption related documentation in Apache Hadoop yet, I suggest to add this part after we have a preemption page. 10) They're what admin should specify. I prefer to not add default value here because default is always changing, which will be tracked by {{yarn-default.xml}} 12) Changed it, it should be percentage of resources on nodes with DEFAULT partition. 13) That's different, {{}} and not specifed means "inherit from parent" 18) REST API is under development, I think we still need some time to finalize it for 2.8. I suggest to add that part later. 19) Added CS link from node labels page, I think it's a relative independent feature. I suggest to not reference from CS. I addressed other items in attached patch. Please let me know your ideas. Thanks, > Documentation development for Node labels requirment > > > Key: YARN-2801 > URL: https://issues.apache.org/jira/browse/YARN-2801 > Project: Hadoop YARN > Issue Type: Sub-task > Components: documentation >Reporter: Gururaj Shetty >Assignee: Wangda Tan > Attachments: YARN-2801.1.patch, YARN-2801.2.patch > > > Documentation needs to be developed for the node label requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3800) Simplify inmemory state for ReservationAllocation
[ https://issues.apache.org/jira/browse/YARN-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596663#comment-14596663 ] Subru Krishnan commented on YARN-3800: -- Thanks [~adhoot] for the patch. I looked at it & just had a couple of comments: 1. Can we have _toResource(ReservationRequest request)_ in a Reservation utility class rather than in _InMemoryReservationAllocation_ 2. I feel we can update the constructor of _InMemoryReservationAllocation_ to take in _Map_ instead of _Map_ so that we do the translation only once. This should simplify the state in GreedyReservationAgent also. > Simplify inmemory state for ReservationAllocation > - > > Key: YARN-3800 > URL: https://issues.apache.org/jira/browse/YARN-3800 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3800.001.patch, YARN-3800.002.patch > > > Instead of storing the ReservationRequest we store the Resource for > allocations, as thats the only thing we need. Ultimately we convert > everything to resources anyway -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3842) NM restarts could lead to app failures
[ https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-3842: Attachment: YARN-3842.001.patch That makes sense. The patch is also a lot simpler; it just adds a retry policy for {{NMNotYetReadyException}}, and a test. > NM restarts could lead to app failures > -- > > Key: YARN-3842 > URL: https://issues.apache.org/jira/browse/YARN-3842 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: Robert Kanter >Priority: Critical > Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, > YARN-3842.001.patch > > > Consider the following scenario: > 1. RM assigns a container on node N to an app A. > 2. Node N is restarted > 3. A tries to launch container on node N. > 3 could lead to an NMNotYetReadyException depending on whether NM N has > registered with the RM. In MR, this is considered a task attempt failure. A > few of these could lead to a task/job failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3842) NM restarts could lead to app failures
[ https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla moved MAPREDUCE-6409 to YARN-3842: --- Target Version/s: 2.7.1 (was: 2.7.1) Affects Version/s: (was: 2.7.0) 2.7.0 Key: YARN-3842 (was: MAPREDUCE-6409) Project: Hadoop YARN (was: Hadoop Map/Reduce) > NM restarts could lead to app failures > -- > > Key: YARN-3842 > URL: https://issues.apache.org/jira/browse/YARN-3842 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: Robert Kanter >Priority: Critical > Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch > > > Consider the following scenario: > 1. RM assigns a container on node N to an app A. > 2. Node N is restarted > 3. A tries to launch container on node N. > 3 could lead to an NMNotYetReadyException depending on whether NM N has > registered with the RM. In MR, this is considered a task attempt failure. A > few of these could lead to a task/job failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3360) Add JMX metrics to TimelineDataManager
[ https://issues.apache.org/jira/browse/YARN-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596630#comment-14596630 ] Hadoop QA commented on YARN-3360: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 25s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 4 new or modified test files. | | {color:green}+1{color} | javac | 7m 31s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 32s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 29s | The applied patch generated 19 new checkstyle issues (total was 7, now 26). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 59s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 3m 10s | Tests passed in hadoop-yarn-server-applicationhistoryservice. | | | | 39m 36s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741115/YARN-3360.003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 445b132 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8313/artifact/patchprocess/diffcheckstylehadoop-yarn-server-applicationhistoryservice.txt | | hadoop-yarn-server-applicationhistoryservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8313/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8313/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8313/console | This message was automatically generated. > Add JMX metrics to TimelineDataManager > -- > > Key: YARN-3360 > URL: https://issues.apache.org/jira/browse/YARN-3360 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Jason Lowe > Labels: BB2015-05-TBR > Attachments: YARN-3360.001.patch, YARN-3360.002.patch, > YARN-3360.003.patch > > > The TimelineDataManager currently has no metrics, outside of the standard JVM > metrics. It would be very useful to at least log basic counts of method > calls, time spent in those calls, and number of entities/events involved. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations
[ https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596616#comment-14596616 ] Ted Yu commented on YARN-3815: -- bq. in the spirit of readless increments as used in Tephra Readless increment feature is implemented in cdap, called delta write. Please take a look at: cdap-hbase-compat-0.98/src/main/java/co/cask/cdap/data2/increment/hbase98/IncrementHandler.java cdap-hbase-compat-0.98//src/main/java/co/cask/cdap/data2/increment/hbase98/IncrementSummingScanner.java The implementation uses hbase coprocessor, BTW > [Aggregation] Application/Flow/User/Queue Level Aggregations > > > Key: YARN-3815 > URL: https://issues.apache.org/jira/browse/YARN-3815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: Timeline Service Nextgen Flow, User, Queue Level > Aggregations (v1).pdf > > > Per previous discussions in some design documents for YARN-2928, the basic > scenario is the query for stats can happen on: > - Application level, expect return: an application with aggregated stats > - Flow level, expect return: aggregated stats for a flow_run, flow_version > and flow > - User level, expect return: aggregated stats for applications submitted by > user > - Queue level, expect return: aggregated stats for applications within the > Queue > Application states is the basic building block for all other level > aggregations. We can provide Flow/User/Queue level aggregated statistics info > based on application states (a dedicated table for application states is > needed which is missing from previous design documents like HBase/Phoenix > schema design). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler
[ https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596620#comment-14596620 ] Hadoop QA commented on YARN-3635: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 14s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 7m 43s | There were no new javac warning messages. | | {color:red}-1{color} | javadoc | 9m 48s | The applied patch generated 2 additional warning messages. | | {color:red}-1{color} | release audit | 0m 18s | The applied patch generated 4 release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 47s | The applied patch generated 18 new checkstyle issues (total was 204, now 215). | | {color:red}-1{color} | whitespace | 0m 3s | The patch has 15 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 1m 27s | The patch appears to introduce 3 new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 61m 8s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 99m 39s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741096/YARN-3635.4.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 445b132 | | javadoc | https://builds.apache.org/job/PreCommit-YARN-Build/8311/artifact/patchprocess/diffJavadocWarnings.txt | | Release Audit | https://builds.apache.org/job/PreCommit-YARN-Build/8311/artifact/patchprocess/patchReleaseAuditProblems.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8311/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8311/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8311/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8311/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8311/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8311/console | This message was automatically generated. > Get-queue-mapping should be a common interface of YarnScheduler > --- > > Key: YARN-3635 > URL: https://issues.apache.org/jira/browse/YARN-3635 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-3635.1.patch, YARN-3635.2.patch, YARN-3635.3.patch, > YARN-3635.4.patch > > > Currently, both of fair/capacity scheduler support queue mapping, which makes > scheduler can change queue of an application after submitted to scheduler. > One issue of doing this in specific scheduler is: If the queue after mapping > has different maximum_allocation/default-node-label-expression of the > original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks > the wrong queue. > I propose to make the queue mapping as a common interface of scheduler, and > RMAppManager set the queue after mapping before doing validations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-110) AM releases too many containers due to the protocol
[ https://issues.apache.org/jira/browse/YARN-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596614#comment-14596614 ] Giovanni Matteo Fumarola commented on YARN-110: --- [~acmurthy], [~vinodkv] any updates on this? If you don't mind, can I work on this? > AM releases too many containers due to the protocol > --- > > Key: YARN-110 > URL: https://issues.apache.org/jira/browse/YARN-110 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Reporter: Arun C Murthy >Assignee: Arun C Murthy > Attachments: YARN-110.patch > > > - AM sends request asking 4 containers on host H1. > - Asynchronously, host H1 reaches RM and gets assigned 4 containers. RM at > this point, sets the value against H1 to > zero in its aggregate request-table for all apps. > - In the mean-while AM gets to need 3 more containers, so a total of 7 > including the 4 from previous request. > - Today, AM sends the absolute number of 7 against H1 to RM as part of its > request table. > - RM seems to be overriding its earlier value of zero against H1 to 7 against > H1. And thus allocating 7 more > containers. > - AM already gets 4 in this scheduling iteration, but gets 7 more, a total of > 11 instead of the required 7. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED
[ https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596606#comment-14596606 ] zhihai xu commented on YARN-3798: - I think we should also create a new session for SessionMovedException. We hit the SessionMovedException before, the following is the reason for the SessionMovedException we find: # ZK client tried to connect to Leader L. Network was very slow, so before leader processed the request, client disconnected. # Client then re-connected to Follower F reusing the same session ID. It was successful. # The request in step 1 went into leader. Leader processed it and invalidated the connection created in step 2. But client didn't know the connection it used is invalidated. # Client got SessionMovedException when it used the connection invalidated by leader for any ZooKeeper operation. IMHO, the only way to recover from this error at RM side is to take SessionMovedException as SessionExpiredException, close current ZK client and create a new one. > ZKRMStateStore shouldn't create new session without occurrance of > SESSIONEXPIED > --- > > Key: YARN-3798 > URL: https://issues.apache.org/jira/browse/YARN-3798 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 > Environment: Suse 11 Sp3 >Reporter: Bibin A Chundatt >Assignee: Varun Saxena >Priority: Blocker > Attachments: RM.log, YARN-3798-2.7.002.patch, > YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.patch > > > RM going down with NoNode exception during create of znode for appattempt > *Please find the exception logs* > {code} > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-09 10:09:44,886 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Exception while executing a ZK operation. > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode > at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) > at java
[jira] [Updated] (YARN-3360) Add JMX metrics to TimelineDataManager
[ https://issues.apache.org/jira/browse/YARN-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-3360: - Attachment: YARN-3360.003.patch Rebased patch on trunk > Add JMX metrics to TimelineDataManager > -- > > Key: YARN-3360 > URL: https://issues.apache.org/jira/browse/YARN-3360 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Jason Lowe > Labels: BB2015-05-TBR > Attachments: YARN-3360.001.patch, YARN-3360.002.patch, > YARN-3360.003.patch > > > The TimelineDataManager currently has no metrics, outside of the standard JVM > metrics. It would be very useful to at least log basic counts of method > calls, time spent in those calls, and number of entities/events involved. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED
[ https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596522#comment-14596522 ] Varun Saxena commented on YARN-3798: Thanks [~ozawa]. Explanation given by you and subsequent discussions with [~rakeshr] helped a lot in clarifying behavior of zookeeper. > ZKRMStateStore shouldn't create new session without occurrance of > SESSIONEXPIED > --- > > Key: YARN-3798 > URL: https://issues.apache.org/jira/browse/YARN-3798 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 > Environment: Suse 11 Sp3 >Reporter: Bibin A Chundatt >Assignee: Varun Saxena >Priority: Blocker > Attachments: RM.log, YARN-3798-2.7.002.patch, > YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.patch > > > RM going down with NoNode exception during create of znode for appattempt > *Please find the exception logs* > {code} > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-09 10:09:44,886 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Exception while executing a ZK operation. > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode > at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) > at java.lang.Thread.run(Thread.java:745) > 2015-06-09 10:09:44,887 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed > out ZK retries. Giving up! > 2015-06-09 10:09:44,887 ERROR > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error > updating appAttempt: appattempt_1433764310492_7152_01 > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode > at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) > at > org.apache.hadoop.yarn.server.resourcemanager
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596481#comment-14596481 ] Jian He commented on YARN-1963: --- I think we need to move this forward.. Overall, I prefer using numeric priority to label-based priority because the former is simpler and more flexible if user wants to define a wide range of priorities. no extra configs. User does not need to be educated about the new mapping any time the mapping changes. Also, one problem is that if we refresh the priority mapping while some existing long-running jobs are already running on certain priority, how do we map the previous priority mapping range to the new priority mapping range? In addition, if everyone runs the application at “VERY_HIGH” priority, the “HIGH” priority, though named as “HIGH”, is not really the “HIGH” priority any more. It actually becomes the “LOWEST” priority. My point is that the importance of priority will make sense only when compared with its peers. In that sense, I think adding a utility to surface how applications are distributed across each priority so that user can reason about how to place the application on certain priority may be more useful than adding a static naming mapping to let people reason about the relative importance of priority by naming. > Support priorities across applications within the same queue > - > > Key: YARN-1963 > URL: https://issues.apache.org/jira/browse/YARN-1963 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, resourcemanager >Reporter: Arun C Murthy >Assignee: Sunil G > Attachments: 0001-YARN-1963-prototype.patch, YARN Application > Priorities Design.pdf, YARN Application Priorities Design_01.pdf > > > It will be very useful to support priorities among applications within the > same queue, particularly in production scenarios. It allows for finer-grained > controls without having to force admins to create a multitude of queues, plus > allows existing applications to continue using existing queues which are > usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED
[ https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596470#comment-14596470 ] Hadoop QA commented on YARN-3798: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741098/YARN-3798-2.7.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 445b132 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8312/console | This message was automatically generated. > ZKRMStateStore shouldn't create new session without occurrance of > SESSIONEXPIED > --- > > Key: YARN-3798 > URL: https://issues.apache.org/jira/browse/YARN-3798 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 > Environment: Suse 11 Sp3 >Reporter: Bibin A Chundatt >Assignee: Varun Saxena >Priority: Blocker > Attachments: RM.log, YARN-3798-2.7.002.patch, > YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.patch > > > RM going down with NoNode exception during create of znode for appattempt > *Please find the exception logs* > {code} > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-09 10:09:44,886 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Exception while executing a ZK operation. > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode > at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) > at java.lang.Thread.run(Thread.java:745) > 2015-06-09 10:09:44,887 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed > out ZK retries. Giving up! > 2015-06-09 10:09:44,887 ERROR > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error > up
[jira] [Updated] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED
[ https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3798: -- Attachment: YARN-3798-2.7.002.patch > ZKRMStateStore shouldn't create new session without occurrance of > SESSIONEXPIED > --- > > Key: YARN-3798 > URL: https://issues.apache.org/jira/browse/YARN-3798 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 > Environment: Suse 11 Sp3 >Reporter: Bibin A Chundatt >Assignee: Varun Saxena >Priority: Blocker > Attachments: RM.log, YARN-3798-2.7.002.patch, > YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.patch > > > RM going down with NoNode exception during create of znode for appattempt > *Please find the exception logs* > {code} > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-09 10:09:44,886 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Exception while executing a ZK operation. > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode > at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) > at java.lang.Thread.run(Thread.java:745) > 2015-06-09 10:09:44,887 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed > out ZK retries. Giving up! > 2015-06-09 10:09:44,887 ERROR > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error > updating appAttempt: appattempt_1433764310492_7152_01 > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode > at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMSta
[jira] [Commented] (YARN-3820) Collect disks usages on the node
[ https://issues.apache.org/jira/browse/YARN-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596451#comment-14596451 ] Inigo Goiri commented on YARN-3820: --- You may want to exclude the change in CommonNodeLabelsManager.java as it's not related to this patch. > Collect disks usages on the node > > > Key: YARN-3820 > URL: https://issues.apache.org/jira/browse/YARN-3820 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 3.0.0 >Reporter: Robert Grandl >Assignee: Robert Grandl > Labels: yarn-common, yarn-util > Attachments: YARN-3820-1.patch, YARN-3820-2.patch, YARN-3820-3.patch, > YARN-3820-4.patch > > > In this JIRA we propose to collect disks usages on a node. This JIRA is part > of a larger effort of monitoring resource usages on the nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3820) Collect disks usages on the node
[ https://issues.apache.org/jira/browse/YARN-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596463#comment-14596463 ] Robert Grandl commented on YARN-3820: - [~elgoiri], I fixed the warning because HadoopQA javadoc was -1. I will revert the change if HadoopQA will return +1. > Collect disks usages on the node > > > Key: YARN-3820 > URL: https://issues.apache.org/jira/browse/YARN-3820 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 3.0.0 >Reporter: Robert Grandl >Assignee: Robert Grandl > Labels: yarn-common, yarn-util > Attachments: YARN-3820-1.patch, YARN-3820-2.patch, YARN-3820-3.patch, > YARN-3820-4.patch > > > In this JIRA we propose to collect disks usages on a node. This JIRA is part > of a larger effort of monitoring resource usages on the nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED
[ https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596461#comment-14596461 ] Hadoop QA commented on YARN-3798: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12740206/YARN-3798-branch-2.7.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 445b132 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8310/console | This message was automatically generated. > ZKRMStateStore shouldn't create new session without occurrance of > SESSIONEXPIED > --- > > Key: YARN-3798 > URL: https://issues.apache.org/jira/browse/YARN-3798 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 > Environment: Suse 11 Sp3 >Reporter: Bibin A Chundatt >Assignee: Varun Saxena >Priority: Blocker > Attachments: RM.log, YARN-3798-branch-2.7.002.patch, > YARN-3798-branch-2.7.patch > > > RM going down with NoNode exception during create of znode for appattempt > *Please find the exception logs* > {code} > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-09 10:09:44,886 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Exception while executing a ZK operation. > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode > at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) > at java.lang.Thread.run(Thread.java:745) > 2015-06-09 10:09:44,887 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed > out ZK retries. Giving up! > 2015-06-09 10:09:44,887 ERROR > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error > updating appAttempt:
[jira] [Updated] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler
[ https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3635: - Attachment: YARN-3635.4.patch Sorry for my late response, [~vinodkv]. Just have some bandwidth to do the update. Attached ver.4 addressed most of your comments, now queue-placement-rules is a separated module in RM, and scheduler initializes it. RMAppManager uses it to do queue placing. Defined interfaces are not exactly as same as you suggested, I put minimal set of interfaces needed in my mind. You can take a look at: {{org.apache.hadoop.yarn.server.resourcemanager.placement}} for details. And the ver.4 patch makes original CapacityScheduler.QueueMapping becomes a rule: UserGroupPlacementRule. Thoughts? > Get-queue-mapping should be a common interface of YarnScheduler > --- > > Key: YARN-3635 > URL: https://issues.apache.org/jira/browse/YARN-3635 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-3635.1.patch, YARN-3635.2.patch, YARN-3635.3.patch, > YARN-3635.4.patch > > > Currently, both of fair/capacity scheduler support queue mapping, which makes > scheduler can change queue of an application after submitted to scheduler. > One issue of doing this in specific scheduler is: If the queue after mapping > has different maximum_allocation/default-node-label-expression of the > original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks > the wrong queue. > I propose to make the queue mapping as a common interface of scheduler, and > RMAppManager set the queue after mapping before doing validations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3790) TestWorkPreservingRMRestart#testSchedulerRecovery fails intermittently in trunk for FS scheduler
[ https://issues.apache.org/jira/browse/YARN-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596441#comment-14596441 ] Jian He commented on YARN-3790: --- lgtm, thanks [~zxu] and [~rohithsharma] > TestWorkPreservingRMRestart#testSchedulerRecovery fails intermittently in > trunk for FS scheduler > > > Key: YARN-3790 > URL: https://issues.apache.org/jira/browse/YARN-3790 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, test >Reporter: Rohith Sharma K S >Assignee: zhihai xu > Attachments: YARN-3790.000.patch > > > Failure trace is as follows > {noformat} > Tests run: 28, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 284.078 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart > testSchedulerRecovery[1](org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart) > Time elapsed: 6.502 sec <<< FAILURE! > java.lang.AssertionError: expected:<6144> but was:<8192> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.assertMetrics(TestWorkPreservingRMRestart.java:853) > at > org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.checkFSQueue(TestWorkPreservingRMRestart.java:342) > at > org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testSchedulerRecovery(TestWorkPreservingRMRestart.java:241) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3768) Index out of range exception with environment variables without values
[ https://issues.apache.org/jira/browse/YARN-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596443#comment-14596443 ] Gera Shegalov commented on YARN-3768: - Instead of executing two regexes: first directly via Pattern p = Pattern.compile(Shell.getEnvironmentVariableRegex()) and then via split can we simply match via a single regex? we can use a capture group to get the value. > Index out of range exception with environment variables without values > -- > > Key: YARN-3768 > URL: https://issues.apache.org/jira/browse/YARN-3768 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.5.0 >Reporter: Joe Ferner >Assignee: zhihai xu > Attachments: YARN-3768.000.patch, YARN-3768.001.patch > > > Looking at line 80 of org.apache.hadoop.yarn.util.Apps an index out of range > exception occurs if an environment variable is encountered without a value. > I believe this occurs because java will not return empty strings from the > split method. Similar to this > http://stackoverflow.com/questions/14602062/java-string-split-removed-empty-values -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596437#comment-14596437 ] Giovanni Matteo Fumarola commented on YARN-3116: Thanks [~zjshen] for quickly reviewing the patch & your comments. 1. I agree that ContainerTokenIdentifier would be a better place to do it so that we keep the flag internal but the ContainerTokenIdentifier is created before the state transition in RMAppAttempt that sets the AM flag in RMContainer. I can try to recreate ContainerTokenIdentifier at the AM launch but that looks unwieldy. Do you have any suggestions on how to do it cleaner? 2. Again a good observation, I'll add this in the next iteration of the patch based on your suggestion for (1) above. > [Collector wireup] We need an assured way to determine if a container is an > AM container on NM > -- > > Key: YARN-3116 > URL: https://issues.apache.org/jira/browse/YARN-3116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Reporter: Zhijie Shen >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-3116.patch > > > In YARN-3030, to start the per-app aggregator only for a started AM > container, we need to determine if the container is an AM container or not > from the context in NM (we can do it on RM). This information is missing, > such that we worked around to considered the container with ID "_01" as > the AM container. Unfortunately, this is neither necessary or sufficient > condition. We need to have a way to determine if a container is an AM > container on NM. We can add flag to the container object or create an API to > do the judgement. Perhaps the distributed AM information may also be useful > to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state
[ https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596337#comment-14596337 ] Hadoop QA commented on YARN-2902: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 40s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 34s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 33s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 36s | The applied patch generated 25 new checkstyle issues (total was 168, now 187). | | {color:green}+1{color} | whitespace | 0m 4s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 14s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 6m 24s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 43m 37s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741076/YARN-2902.03.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 445b132 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8309/artifact/patchprocess/diffcheckstylehadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8309/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8309/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8309/console | This message was automatically generated. > Killing a container that is localizing can orphan resources in the > DOWNLOADING state > > > Key: YARN-2902 > URL: https://issues.apache.org/jira/browse/YARN-2902 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: Jason Lowe >Assignee: Varun Saxena > Attachments: YARN-2902.002.patch, YARN-2902.03.patch, YARN-2902.patch > > > If a container is in the process of localizing when it is stopped/killed then > resources are left in the DOWNLOADING state. If no other container comes > along and requests these resources they linger around with no reference > counts but aren't cleaned up during normal cache cleanup scans since it will > never delete resources in the DOWNLOADING state even if their reference count > is zero. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3835) hadoop-yarn-server-resourcemanager test package bundles core-site.xml, yarn-site.xml
[ https://issues.apache.org/jira/browse/YARN-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596327#comment-14596327 ] Robert Kanter commented on YARN-3835: - +1 > hadoop-yarn-server-resourcemanager test package bundles core-site.xml, > yarn-site.xml > > > Key: YARN-3835 > URL: https://issues.apache.org/jira/browse/YARN-3835 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Vamsee Yarlagadda >Assignee: Vamsee Yarlagadda >Priority: Minor > Attachments: YARN-3835.patch > > > It looks like by default yarn is bundling core-site.xml, yarn-site.xml in > test artifact of hadoop-yarn-server-resourcemanager which means that any > downstream project which uses this a dependency can have a problem in picking > up the user supplied/environment supplied core-site.xml, yarn-site.xml > So we should ideally exclude these .xml files from being bundled into the > test-jar. (Similar to YARN-1748) > I also proactively looked at other YARN modules where this might be > happening. > {code} > vamsee-MBP:hadoop-yarn-project vamsee$ find . -name "*-site.xml" > ./hadoop-yarn/conf/yarn-site.xml > ./hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/resources/yarn-site.xml > ./hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/test/resources/yarn-site.xml > ./hadoop-yarn/hadoop-yarn-client/src/test/resources/core-site.xml > ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/core-site.xml > ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/core-site.xml > ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site.xml > ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/target/test-classes/core-site.xml > ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/target/test-classes/yarn-site.xml > ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/resources/core-site.xml > {code} > And out of these only two modules (hadoop-yarn-server-resourcemanager, > hadoop-yarn-server-tests) are building test-jars. In future, if we start > building test-jar of other modules, we should exclude these xml files from > being bundled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3835) hadoop-yarn-server-resourcemanager test package bundles core-site.xml, yarn-site.xml
[ https://issues.apache.org/jira/browse/YARN-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-3835: Target Version/s: 2.8.0 > hadoop-yarn-server-resourcemanager test package bundles core-site.xml, > yarn-site.xml > > > Key: YARN-3835 > URL: https://issues.apache.org/jira/browse/YARN-3835 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Vamsee Yarlagadda >Assignee: Vamsee Yarlagadda >Priority: Minor > Attachments: YARN-3835.patch > > > It looks like by default yarn is bundling core-site.xml, yarn-site.xml in > test artifact of hadoop-yarn-server-resourcemanager which means that any > downstream project which uses this a dependency can have a problem in picking > up the user supplied/environment supplied core-site.xml, yarn-site.xml > So we should ideally exclude these .xml files from being bundled into the > test-jar. (Similar to YARN-1748) > I also proactively looked at other YARN modules where this might be > happening. > {code} > vamsee-MBP:hadoop-yarn-project vamsee$ find . -name "*-site.xml" > ./hadoop-yarn/conf/yarn-site.xml > ./hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/resources/yarn-site.xml > ./hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/test/resources/yarn-site.xml > ./hadoop-yarn/hadoop-yarn-client/src/test/resources/core-site.xml > ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/core-site.xml > ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/core-site.xml > ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site.xml > ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/target/test-classes/core-site.xml > ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/target/test-classes/yarn-site.xml > ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/resources/core-site.xml > {code} > And out of these only two modules (hadoop-yarn-server-resourcemanager, > hadoop-yarn-server-tests) are building test-jars. In future, if we start > building test-jar of other modules, we should exclude these xml files from > being bundled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3176) In Fair Scheduler, child queue should inherit maxApp from its parent
[ https://issues.apache.org/jira/browse/YARN-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596257#comment-14596257 ] Siqi Li commented on YARN-3176: --- Hi [~djp], can you take a look at patch v2. The checkstyle issues and test errors do not seems to apply to this patch > In Fair Scheduler, child queue should inherit maxApp from its parent > > > Key: YARN-3176 > URL: https://issues.apache.org/jira/browse/YARN-3176 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Siqi Li >Assignee: Siqi Li > Attachments: YARN-3176.v1.patch, YARN-3176.v2.patch > > > if the child queue does not have a maxRunningApp limit, it will use the > queueMaxAppsDefault. This behavior is not quite right, since > queueMaxAppsDefault is normally a small number, whereas some parent queues do > have maxRunningApp set to be more than the default -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state
[ https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-2902: --- Attachment: YARN-2902.03.patch > Killing a container that is localizing can orphan resources in the > DOWNLOADING state > > > Key: YARN-2902 > URL: https://issues.apache.org/jira/browse/YARN-2902 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: Jason Lowe >Assignee: Varun Saxena > Attachments: YARN-2902.002.patch, YARN-2902.03.patch, YARN-2902.patch > > > If a container is in the process of localizing when it is stopped/killed then > resources are left in the DOWNLOADING state. If no other container comes > along and requests these resources they linger around with no reference > counts but aren't cleaned up during normal cache cleanup scans since it will > never delete resources in the DOWNLOADING state even if their reference count > is zero. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3840) Resource Manager web ui bug on main view after application number 9999
[ https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596230#comment-14596230 ] Xuan Gong commented on YARN-3840: - [~Alexandre LINTE] Hey, could you provide which version of hadoop you are using ? 2.7 ? > Resource Manager web ui bug on main view after application number > -- > > Key: YARN-3840 > URL: https://issues.apache.org/jira/browse/YARN-3840 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 > Environment: Centos 6.6 > Java 1.7 >Reporter: LINTE > > On the WEBUI, the global main view page : > http://resourcemanager:8088/cluster/apps doesn't display applications over > . > With command line it works (# yarn application -list). > Regards, > Alexandre -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations
[ https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596173#comment-14596173 ] Ted Yu commented on YARN-3815: -- My comment is related to usage of hbase. bq. under framework_specific_metrics column family Since column family name appears in every KeyValue, it would be better to use very short column family name. e.g. f_m for framework metrics. > [Aggregation] Application/Flow/User/Queue Level Aggregations > > > Key: YARN-3815 > URL: https://issues.apache.org/jira/browse/YARN-3815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: Timeline Service Nextgen Flow, User, Queue Level > Aggregations (v1).pdf > > > Per previous discussions in some design documents for YARN-2928, the basic > scenario is the query for stats can happen on: > - Application level, expect return: an application with aggregated stats > - Flow level, expect return: aggregated stats for a flow_run, flow_version > and flow > - User level, expect return: aggregated stats for applications submitted by > user > - Queue level, expect return: aggregated stats for applications within the > Queue > Application states is the basic building block for all other level > aggregations. We can provide Flow/User/Queue level aggregated statistics info > based on application states (a dedicated table for application states is > needed which is missing from previous design documents like HBase/Phoenix > schema design). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations
[ https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596129#comment-14596129 ] Junping Du commented on YARN-3815: -- Thanks [~sjlee0] and [~jrottinghuis] for review and good comments in detail. [~jrottinghuis]'s comments are pretty long and I could only reply part of it and will finish the left parts tomorrow. :) bq. For framework-specific metrics, I would say this falls on the individual frameworks. The framework AM usually already aggregates them in memory (consider MR job counters for example). So for them it is straightforward to write them out directly onto the YARN app entities. Furthermore, it is problematic to add them to the sub-app YARN entities and ask YARN to aggregate them to the application. Framework’s sub-app entities may not even align with YARN’s sub-app entities. For example, in case of MR, there is a reasonable one-to-one mapping between a mapper/reducer task attempt and a container, but for other applications that may not be true. Forcing all frameworks to hang values at containers may not be practical. I think it’s far easier for frameworks to write aggregated values to the YARN app entities. AM currently leverage YARN's AppTimelineCollector to forward entities to backend storage, so making AM talk directly to backend storage is not considered to be safe. It is also not necessary too because the real difficulty here is to aggregate framework specific metrics in other levels (flow, user and queue), because that beyond the life cycle of framework so YARN have to take care of it. Instead of asking frameworks to handle specific metrics themselves, I would like to propose to treat these metrics as "anonymous", it would pass both metrics name and value to YARN's collector and YARN's collector could aggregate it and store as dynamic column (under framework_specific_metrics column family) into app states table. So other (flow, user, etc.) level aggregation on freamework metrics could happen based on this. bq. app-to-flow online aggregation. This is more or less live aggregated metrics at the flow level. This will still be based on the native HBase schema. About flow online aggregation, I am not quite sure on requirement yet. Do we really want real time for flow aggregated data or some fine-grained time interval (like 15 secs) should be good enough - if we want to show some nice metrics chart for flow, this should be fine. Even for real time, we don't have to aggregate everything from raw entity table, we don't have to duplicated count metrics again for finished apps. Isn't it? bq. (3) time-based flow aggregation: This is different than the online aggregation in the sense that it is aggregated along the time boundary (e.g. “daily”, “weekly”, etc.). This can be based on the Phoenix schema. This can be populated in an offline fashion (e.g. running a mapreduce job). Any special reason not to handle it in the same way above - as HBase coprocessor? It just sound like gross-grained time interval. Isn't it? bq. This is another “offline” aggregation type. Also, I believe we’re talking about only time-based aggregation. In other words, we would aggregate values for users only with a well-defined time window. There won’t be a “real-time” aggregation of values, similar to the flow aggregation. I would also call for a fine-grained time interval (closed to real-time) because the aggregated resource metrics on user could be used in billing hadoop usage in a shared environment (no matter private or public cloud), so user need to know more details on resource consumption especially in some random peak time. bq. Very much agree with separation into 2 categories "online" versus "periodic". I think this will be natural split between the native HBase tables for the former and the Phoenix approach for the latter to each emphasize their relative strengths. I would question the necessary for "online" again if this mean "real time" instead of fine-grained time interval. Actually, as a building block, every container metrics (cpu, memory, etc.) are generated in a time interval instead of real time. As a result, we never know the exactly snapshot of whole system in a precisely time but only can try to getting closer. > [Aggregation] Application/Flow/User/Queue Level Aggregations > > > Key: YARN-3815 > URL: https://issues.apache.org/jira/browse/YARN-3815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: Timeline Service Nextgen Flow, User, Queue Level > Aggregations (v1).pdf > > > Per previous discussions in some design documents for YARN-2928, the basic > scenario is the qu
[jira] [Commented] (YARN-3834) Scrub debug logging of tokens during resource localization.
[ https://issues.apache.org/jira/browse/YARN-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596089#comment-14596089 ] Hudson commented on YARN-3834: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2182 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2182/]) YARN-3834. Scrub debug logging of tokens during resource localization. Contributed by Chris Nauroth (xgong: rev 6c7a9d502a633b5aca75c9798f19ce4a5729014e) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java > Scrub debug logging of tokens during resource localization. > --- > > Key: YARN-3834 > URL: https://issues.apache.org/jira/browse/YARN-3834 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Fix For: 2.8.0 > > Attachments: YARN-3834.001.patch > > > During resource localization, the NodeManager logs tokens at debug level to > aid troubleshooting. This includes the full token representation. Best > practice is to avoid logging anything secret, even at debug level. We can > improve on this by changing the logging to use a scrubbed representation of > the token. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3834) Scrub debug logging of tokens during resource localization.
[ https://issues.apache.org/jira/browse/YARN-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596059#comment-14596059 ] Hudson commented on YARN-3834: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #234 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/234/]) YARN-3834. Scrub debug logging of tokens during resource localization. Contributed by Chris Nauroth (xgong: rev 6c7a9d502a633b5aca75c9798f19ce4a5729014e) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java * hadoop-yarn-project/CHANGES.txt > Scrub debug logging of tokens during resource localization. > --- > > Key: YARN-3834 > URL: https://issues.apache.org/jira/browse/YARN-3834 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Fix For: 2.8.0 > > Attachments: YARN-3834.001.patch > > > During resource localization, the NodeManager logs tokens at debug level to > aid troubleshooting. This includes the full token representation. Best > practice is to avoid logging anything secret, even at debug level. We can > improve on this by changing the logging to use a scrubbed representation of > the token. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3834) Scrub debug logging of tokens during resource localization.
[ https://issues.apache.org/jira/browse/YARN-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595984#comment-14595984 ] Hudson commented on YARN-3834: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #225 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/225/]) YARN-3834. Scrub debug logging of tokens during resource localization. Contributed by Chris Nauroth (xgong: rev 6c7a9d502a633b5aca75c9798f19ce4a5729014e) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java > Scrub debug logging of tokens during resource localization. > --- > > Key: YARN-3834 > URL: https://issues.apache.org/jira/browse/YARN-3834 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Fix For: 2.8.0 > > Attachments: YARN-3834.001.patch > > > During resource localization, the NodeManager logs tokens at debug level to > aid troubleshooting. This includes the full token representation. Best > practice is to avoid logging anything secret, even at debug level. We can > improve on this by changing the logging to use a scrubbed representation of > the token. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3834) Scrub debug logging of tokens during resource localization.
[ https://issues.apache.org/jira/browse/YARN-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595971#comment-14595971 ] Hudson commented on YARN-3834: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2164 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2164/]) YARN-3834. Scrub debug logging of tokens during resource localization. Contributed by Chris Nauroth (xgong: rev 6c7a9d502a633b5aca75c9798f19ce4a5729014e) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java > Scrub debug logging of tokens during resource localization. > --- > > Key: YARN-3834 > URL: https://issues.apache.org/jira/browse/YARN-3834 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Fix For: 2.8.0 > > Attachments: YARN-3834.001.patch > > > During resource localization, the NodeManager logs tokens at debug level to > aid troubleshooting. This includes the full token representation. Best > practice is to avoid logging anything secret, even at debug level. We can > improve on this by changing the logging to use a scrubbed representation of > the token. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3840) Resource Manager web ui bug on main view after application number 9999
[ https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595896#comment-14595896 ] LINTE commented on YARN-3840: - Hi, There is no Java stackstrace for this bug. I think that the property yarn.resourcemanager.max-completed-applications is in cause (default value is 1), but it doesn't work properly. Maybe yarn.resourcemanager.max-completed-applications is only effective on ResourceManager GUI. Regards, > Resource Manager web ui bug on main view after application number > -- > > Key: YARN-3840 > URL: https://issues.apache.org/jira/browse/YARN-3840 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 > Environment: Centos 6.6 > Java 1.7 >Reporter: LINTE > > On the WEBUI, the global main view page : > http://resourcemanager:8088/cluster/apps doesn't display applications over > . > With command line it works (# yarn application -list). > Regards, > Alexandre -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3841) [Storage abstraction] Create HDFS backing storage implementation for ATS writes
[ https://issues.apache.org/jira/browse/YARN-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-3841: - Description: HDFS backing storage is useful for following scenarios. 1. For Hadoop clusters which don't run HBase. 2. For fallback from HBase when HBase cluster is temporary unavailable. Quoting ATS design document of YARN-2928: {quote} In the case the HBase storage is not available, the plugin should buffer the writes temporarily (e.g. HDFS), and flush them once the storage comes back online. Reading and writing to hdfs as the the backup storage could potentially use the HDFS writer plugin unless the complexity of generalizing the HDFS writer plugin for this purpose exceeds the benefits of reusing it here. {quote} was: HDFS backing storage is useful for following scenarios. 1. For Hadoop clusters which don't run HBase. 2. For fallback from HBase when HBase cluster is temporary unavailable. {quote} In the case the HBase storage is not available, the plugin should buffer the writes temporarily (e.g. HDFS), and flush them once the storage comes back online. Reading and writing to hdfs as the the backup storage could potentially use the HDFS writer plugin unless the complexity of generalizing the HDFS writer plugin for this purpose exceeds the benefits of reusing it here. {quote} > [Storage abstraction] Create HDFS backing storage implementation for ATS > writes > --- > > Key: YARN-3841 > URL: https://issues.apache.org/jira/browse/YARN-3841 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Tsuyoshi Ozawa >Assignee: Tsuyoshi Ozawa > > HDFS backing storage is useful for following scenarios. > 1. For Hadoop clusters which don't run HBase. > 2. For fallback from HBase when HBase cluster is temporary unavailable. > Quoting ATS design document of YARN-2928: > {quote} > In the case the HBase > storage is not available, the plugin should buffer the writes temporarily > (e.g. HDFS), and flush > them once the storage comes back online. Reading and writing to hdfs as the > the backup storage > could potentially use the HDFS writer plugin unless the complexity of > generalizing the HDFS > writer plugin for this purpose exceeds the benefits of reusing it here. > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3841) [Storage abstraction] Create HDFS backing storage implementation for ATS writes
Tsuyoshi Ozawa created YARN-3841: Summary: [Storage abstraction] Create HDFS backing storage implementation for ATS writes Key: YARN-3841 URL: https://issues.apache.org/jira/browse/YARN-3841 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Tsuyoshi Ozawa Assignee: Tsuyoshi Ozawa HDFS backing storage is useful for following scenarios. 1. For Hadoop clusters which don't run HBase. 2. For fallback from HBase when HBase cluster is temporary unavailable. {quote} In the case the HBase storage is not available, the plugin should buffer the writes temporarily (e.g. HDFS), and flush them once the storage comes back online. Reading and writing to hdfs as the the backup storage could potentially use the HDFS writer plugin unless the complexity of generalizing the HDFS writer plugin for this purpose exceeds the benefits of reusing it here. {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3840) Resource Manager web ui bug on main view after application number 9999
[ https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595826#comment-14595826 ] Devaraj K commented on YARN-3840: - Thanks [~Alexandre LINTE] for reporting the issue. Can you paste the exception if you see anything in the RM UI or in the RM logs? > Resource Manager web ui bug on main view after application number > -- > > Key: YARN-3840 > URL: https://issues.apache.org/jira/browse/YARN-3840 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 > Environment: Centos 6.6 > Java 1.7 >Reporter: LINTE > > On the WEBUI, the global main view page : > http://resourcemanager:8088/cluster/apps doesn't display applications over > . > With command line it works (# yarn application -list). > Regards, > Alexandre -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3840) Resource Manager web ui bug on main view after application number 9999
LINTE created YARN-3840: --- Summary: Resource Manager web ui bug on main view after application number Key: YARN-3840 URL: https://issues.apache.org/jira/browse/YARN-3840 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Environment: Centos 6.6 Java 1.7 Reporter: LINTE On the WEBUI, the global main view page : http://resourcemanager:8088/cluster/apps doesn't display applications over . With command line it works (# yarn application -list). Regards, Alexandre -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3834) Scrub debug logging of tokens during resource localization.
[ https://issues.apache.org/jira/browse/YARN-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595805#comment-14595805 ] Hudson commented on YARN-3834: -- FAILURE: Integrated in Hadoop-Yarn-trunk #966 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/966/]) YARN-3834. Scrub debug logging of tokens during resource localization. Contributed by Chris Nauroth (xgong: rev 6c7a9d502a633b5aca75c9798f19ce4a5729014e) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java * hadoop-yarn-project/CHANGES.txt > Scrub debug logging of tokens during resource localization. > --- > > Key: YARN-3834 > URL: https://issues.apache.org/jira/browse/YARN-3834 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Fix For: 2.8.0 > > Attachments: YARN-3834.001.patch > > > During resource localization, the NodeManager logs tokens at debug level to > aid troubleshooting. This includes the full token representation. Best > practice is to avoid logging anything secret, even at debug level. We can > improve on this by changing the logging to use a scrubbed representation of > the token. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3834) Scrub debug logging of tokens during resource localization.
[ https://issues.apache.org/jira/browse/YARN-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595749#comment-14595749 ] Hudson commented on YARN-3834: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #236 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/236/]) YARN-3834. Scrub debug logging of tokens during resource localization. Contributed by Chris Nauroth (xgong: rev 6c7a9d502a633b5aca75c9798f19ce4a5729014e) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/CHANGES.txt > Scrub debug logging of tokens during resource localization. > --- > > Key: YARN-3834 > URL: https://issues.apache.org/jira/browse/YARN-3834 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Fix For: 2.8.0 > > Attachments: YARN-3834.001.patch > > > During resource localization, the NodeManager logs tokens at debug level to > aid troubleshooting. This includes the full token representation. Best > practice is to avoid logging anything secret, even at debug level. We can > improve on this by changing the logging to use a scrubbed representation of > the token. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3826) Race condition in ResourceTrackerService: potential wrong diagnostics messages
[ https://issues.apache.org/jira/browse/YARN-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595717#comment-14595717 ] Hadoop QA commented on YARN-3826: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 32s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 30s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 35s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 48s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 24s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 50m 43s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 89m 7s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12740355/YARN-3826.01.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 445b132 | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8308/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8308/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8308/console | This message was automatically generated. > Race condition in ResourceTrackerService: potential wrong diagnostics messages > -- > > Key: YARN-3826 > URL: https://issues.apache.org/jira/browse/YARN-3826 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: Chengbing Liu >Assignee: Chengbing Liu > Attachments: YARN-3826.01.patch > > > Since we are calling {{setDiagnosticsMessage}} in {{nodeHeartbeat}}, which > can be called concurrently, the static {{resync}} and {{shutdown}} may have > wrong diagnostics messages in some cases. > On the other side, these static members can hardly save any memory, since the > normal heartbeat responses are created for each heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3768) Index out of range exception with environment variables without values
[ https://issues.apache.org/jira/browse/YARN-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595552#comment-14595552 ] Hadoop QA commented on YARN-3768: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 5s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 35s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 53s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 32s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 57s | Tests passed in hadoop-yarn-common. | | | | 40m 4s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12740968/YARN-3768.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 445b132 | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8307/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8307/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8307/console | This message was automatically generated. > Index out of range exception with environment variables without values > -- > > Key: YARN-3768 > URL: https://issues.apache.org/jira/browse/YARN-3768 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.5.0 >Reporter: Joe Ferner >Assignee: zhihai xu > Attachments: YARN-3768.000.patch, YARN-3768.001.patch > > > Looking at line 80 of org.apache.hadoop.yarn.util.Apps an index out of range > exception occurs if an environment variable is encountered without a value. > I believe this occurs because java will not return empty strings from the > split method. Similar to this > http://stackoverflow.com/questions/14602062/java-string-split-removed-empty-values -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3768) Index out of range exception with environment variables without values
[ https://issues.apache.org/jira/browse/YARN-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595495#comment-14595495 ] zhihai xu commented on YARN-3768: - Hi [~xgong], thanks for the review. I uploaded a new patch YARN-3768.001.patch, in which I add a test case to verify bad environment variables are skipped. About keeping trailing empty strings, it will depend on whether an Environment Variable with empty value is a valid use case. MAPREDUCE-5965 adds option to configure Environment Variable with empty value if stream.jobconf.truncate.limit is 0. It looks like an Environment Variable with empty value may be a valid use case. > Index out of range exception with environment variables without values > -- > > Key: YARN-3768 > URL: https://issues.apache.org/jira/browse/YARN-3768 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.5.0 >Reporter: Joe Ferner >Assignee: zhihai xu > Attachments: YARN-3768.000.patch, YARN-3768.001.patch > > > Looking at line 80 of org.apache.hadoop.yarn.util.Apps an index out of range > exception occurs if an environment variable is encountered without a value. > I believe this occurs because java will not return empty strings from the > split method. Similar to this > http://stackoverflow.com/questions/14602062/java-string-split-removed-empty-values -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3768) Index out of range exception with environment variables without values
[ https://issues.apache.org/jira/browse/YARN-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3768: Attachment: YARN-3768.001.patch > Index out of range exception with environment variables without values > -- > > Key: YARN-3768 > URL: https://issues.apache.org/jira/browse/YARN-3768 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.5.0 >Reporter: Joe Ferner >Assignee: zhihai xu > Attachments: YARN-3768.000.patch, YARN-3768.001.patch > > > Looking at line 80 of org.apache.hadoop.yarn.util.Apps an index out of range > exception occurs if an environment variable is encountered without a value. > I believe this occurs because java will not return empty strings from the > split method. Similar to this > http://stackoverflow.com/questions/14602062/java-string-split-removed-empty-values -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3826) Race condition in ResourceTrackerService: potential wrong diagnostics messages
[ https://issues.apache.org/jira/browse/YARN-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595457#comment-14595457 ] Hadoop QA commented on YARN-3826: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 19m 42s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 9m 30s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 42s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 28s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 40s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 44s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 28s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 61m 36s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 106m 18s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart | | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | | | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12740355/YARN-3826.01.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 6c7a9d5 | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8306/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8306/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8306/console | This message was automatically generated. > Race condition in ResourceTrackerService: potential wrong diagnostics messages > -- > > Key: YARN-3826 > URL: https://issues.apache.org/jira/browse/YARN-3826 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: Chengbing Liu >Assignee: Chengbing Liu > Attachments: YARN-3826.01.patch > > > Since we are calling {{setDiagnosticsMessage}} in {{nodeHeartbeat}}, which > can be called concurrently, the static {{resync}} and {{shutdown}} may have > wrong diagnostics messages in some cases. > On the other side, these static members can hardly save any memory, since the > normal heartbeat responses are created for each heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)