[jira] [Commented] (YARN-2684) FairScheduler should tolerate queue configuration changes across RM restarts
[ https://issues.apache.org/jira/browse/YARN-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532353#comment-14532353 ] Hadoop QA commented on YARN-2684: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 33s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 30s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 36s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 46s | The applied patch generated 1 new checkstyle issues (total was 74, now 75). | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 14s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 52m 33s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 88m 45s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12731104/0002-YARN-2684.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 305e473 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7757/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7757/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7757/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7757/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7757/console | This message was automatically generated. FairScheduler should tolerate queue configuration changes across RM restarts Key: YARN-2684 URL: https://issues.apache.org/jira/browse/YARN-2684 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler, resourcemanager Affects Versions: 2.5.1 Reporter: Karthik Kambatla Assignee: Rohith Priority: Critical Attachments: 0001-YARN-2684.patch, 0002-YARN-2684.patch YARN-2308 fixes this issue for CS, this JIRA is to fix it for FS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1564) add some basic workflow YARN services
[ https://issues.apache.org/jira/browse/YARN-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532354#comment-14532354 ] Steve Loughran commented on YARN-1564: -- Sync on AtomicBool warnings are spurious; code is simply using the (final) objects as something to wait/notify off to indicate status updates within the atomic values themselves. Nothing wrong with that —avoids having a separate object, and makes it obvious what things are waiting on. Presumably the check is there for people who don't know what they are doing. add some basic workflow YARN services - Key: YARN-1564 URL: https://issues.apache.org/jira/browse/YARN-1564 Project: Hadoop YARN Issue Type: New Feature Components: api Affects Versions: 2.4.0 Reporter: Steve Loughran Assignee: Steve Loughran Priority: Minor Attachments: YARN-1564-001.patch, YARN-1564-002.patch Original Estimate: 24h Time Spent: 48h Remaining Estimate: 0h I've been using some alternative composite services to help build workflows of process execution in a YARN AM. They and their tests could be moved in YARN for the use by others -this would make it easier to build aggregate services in an AM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532150#comment-14532150 ] Hadoop QA commented on YARN-3044: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 6s | Pre-patch YARN-2928 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 7m 42s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 49s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 1m 24s | The applied patch generated 15 release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 19s | The applied patch generated 5 new checkstyle issues (total was 266, now 270). | | {color:green}+1{color} | whitespace | 0m 2s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 35s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 3m 17s | The patch appears to introduce 15 new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 0m 25s | Tests passed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 53m 37s | Tests failed in hadoop-yarn-server-resourcemanager. | | {color:green}+1{color} | yarn tests | 0m 21s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 95m 35s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-api | | | org.apache.hadoop.yarn.api.records.timelineservice.TimelineMetric$1.compare(Long, Long) negates the return value of Long.compareTo(Long) At TimelineMetric.java:value of Long.compareTo(Long) At TimelineMetric.java:[line 47] | | FindBugs | module:hadoop-yarn-server-resourcemanager | | | Unchecked/unconfirmed cast from org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to org.apache.hadoop.yarn.server.resourcemanager.metrics.AppAttemptFinishedEvent in org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.handle(SystemMetricsEvent) At TimelineServiceV1Publisher.java:org.apache.hadoop.yarn.server.resourcemanager.metrics.AppAttemptFinishedEvent in org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.handle(SystemMetricsEvent) At TimelineServiceV1Publisher.java:[line 103] | | | Unchecked/unconfirmed cast from org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to org.apache.hadoop.yarn.server.resourcemanager.metrics.AppAttemptRegisteredEvent in org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.handle(SystemMetricsEvent) At TimelineServiceV1Publisher.java:org.apache.hadoop.yarn.server.resourcemanager.metrics.AppAttemptRegisteredEvent in org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.handle(SystemMetricsEvent) At TimelineServiceV1Publisher.java:[line 100] | | | Unchecked/unconfirmed cast from org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationACLsUpdatedEvent in org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.handle(SystemMetricsEvent) At TimelineServiceV1Publisher.java:org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationACLsUpdatedEvent in org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.handle(SystemMetricsEvent) At TimelineServiceV1Publisher.java:[line 97] | | | Unchecked/unconfirmed cast from org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationCreatedEvent in org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.handle(SystemMetricsEvent) At TimelineServiceV1Publisher.java:org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationCreatedEvent in org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.handle(SystemMetricsEvent) At TimelineServiceV1Publisher.java:[line 91] | | | Unchecked/unconfirmed cast from org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationFinishedEvent in org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.handle(SystemMetricsEvent) At TimelineServiceV1Publisher.java:org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationFinishedEvent in
[jira] [Updated] (YARN-2684) FairScheduler should tolerate queue configuration changes across RM restarts
[ https://issues.apache.org/jira/browse/YARN-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-2684: - Attachment: 0002-YARN-2684.patch [~xgong] thanks for looking in the patch.. Rebased the patch and updated. Kindly review the updated patch FairScheduler should tolerate queue configuration changes across RM restarts Key: YARN-2684 URL: https://issues.apache.org/jira/browse/YARN-2684 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler, resourcemanager Affects Versions: 2.5.1 Reporter: Karthik Kambatla Assignee: Rohith Priority: Critical Attachments: 0001-YARN-2684.patch, 0002-YARN-2684.patch YARN-2308 fixes this issue for CS, this JIRA is to fix it for FS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3589) RM and AH web UI display DOCTYPE
[ https://issues.apache.org/jira/browse/YARN-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-3589: - Attachment: 0001-YARN-3589.patch Attached the patch fixing the issue. Kindly review the patch RM and AH web UI display DOCTYPE Key: YARN-3589 URL: https://issues.apache.org/jira/browse/YARN-3589 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.8.0 Reporter: Rohith Assignee: Rohith Attachments: 0001-YARN-3589.patch, YARN-3589.PNG RM web app UI display {{!DOCTYPE html PUBLIC -\/\/W3C\/\/DTD HTML 4.01\/\/EN http:\/\/www.w3.org\/TR\/html4\/strict.dtd}} which is not necessary. This is because, content of html page is escaped which result browser cant not parse it. Any content which is escaped should be with the HTML block , but doc type is above html which browser can't parse it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2684) FairScheduler should tolerate queue configuration changes across RM restarts
[ https://issues.apache.org/jira/browse/YARN-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-2684: - Labels: BB2015-05-TBR (was: ) FairScheduler should tolerate queue configuration changes across RM restarts Key: YARN-2684 URL: https://issues.apache.org/jira/browse/YARN-2684 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler, resourcemanager Affects Versions: 2.5.1 Reporter: Karthik Kambatla Assignee: Rohith Priority: Critical Labels: BB2015-05-TBR Attachments: 0001-YARN-2684.patch, 0002-YARN-2684.patch YARN-2308 fixes this issue for CS, this JIRA is to fix it for FS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3577) Misspelling of threshold in log4j.properties for tests
[ https://issues.apache.org/jira/browse/YARN-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-3577: Affects Version/s: 2.7.0 Labels: (was: BB2015-05-TBR) Misspelling of threshold in log4j.properties for tests -- Key: YARN-3577 URL: https://issues.apache.org/jira/browse/YARN-3577 Project: Hadoop YARN Issue Type: Bug Components: test Affects Versions: 2.7.0 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Minor Fix For: 2.8.0 Attachments: YARN-3577.patch log4j.properties file for test contains misspelling log4j.threshhold. We should use log4j.threshold correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-868) YarnClient should set the service address in tokens returned by getRMDelegationToken()
[ https://issues.apache.org/jira/browse/YARN-868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532250#comment-14532250 ] Varun Saxena commented on YARN-868: --- Thanks [~djp] and [~gtCarrera9] for reviewing this. To give you a background, refer to linked [file in Tez|https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/client/ResourceMgrDelegate.java]. Here as you can see, Tez needs service address while getting Delegation token This address is currently taken from the config {{yarn.resourcemenager.address}} instead of the actual RM address which returned the token in case of HA. I am not aware though about what Tez does with this service address though. Hitesh can probably throw some light on that. Coming to other comments, bq. Do we want to mark the newly added getServiceAddress to be public and stable (especially when we have a private and unstable setter)? True, I will mark it as public and unstable. bq.Since we're adding a new field in tokens, do you think it's worthwhile to add special test cases on the system working with old token formats? We may not want to simply exempt them in the tests but define a standard behavior for them. Personally, I think this is important to keep rolling upgrade feature safe. The change in token applies only to client side. I haven't changed TokenProto so token transferred over wire would still remain same. I did not create a new class to incorporate this change to maintain API backwards compatible. Let me know if you feel differently. Will fix checkstyle and whitespace issues. YarnClient should set the service address in tokens returned by getRMDelegationToken() -- Key: YARN-868 URL: https://issues.apache.org/jira/browse/YARN-868 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Varun Saxena Labels: BB2015-05-TBR Attachments: YARN-868.patch Either the client should set this information into the token or the client layer should expose an api that returns the service address. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3589) RM and AH web UI display DOCTYPE
Rohith created YARN-3589: Summary: RM and AH web UI display DOCTYPE Key: YARN-3589 URL: https://issues.apache.org/jira/browse/YARN-3589 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.8.0 Reporter: Rohith Assignee: Rohith RM web app UI display {{!DOCTYPE html PUBLIC -\/\/W3C\/\/DTD HTML 4.01\/\/EN http:\/\/www.w3.org\/TR\/html4\/strict.dtd}} which is not necessary. This is because, content of html page is escaped which result browser cant not parse it. Any content which is escaped should be with the HTML block , but doc type is above html which browser can't parse it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-679) add an entry point that can start any Yarn service
[ https://issues.apache.org/jira/browse/YARN-679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532307#comment-14532307 ] Steve Loughran commented on YARN-679: - I haven't gone near this recently; I'd like the workflow stuff to go in first as then they can hook up in order add an entry point that can start any Yarn service -- Key: YARN-679 URL: https://issues.apache.org/jira/browse/YARN-679 Project: Hadoop YARN Issue Type: New Feature Components: api Affects Versions: 2.4.0 Reporter: Steve Loughran Assignee: Steve Loughran Labels: BB2015-05-TBR Attachments: YARN-679-001.patch, YARN-679-002.patch, YARN-679-002.patch, YARN-679-003.patch, org.apache.hadoop.servic...mon 3.0.0-SNAPSHOT API).pdf Time Spent: 72h Remaining Estimate: 0h There's no need to write separate .main classes for every Yarn service, given that the startup mechanism should be identical: create, init, start, wait for stopped -with an interrupt handler to trigger a clean shutdown on a control-c interrrupt. Provide one that takes any classname, and a list of config files/options -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3589) RM and AH web UI display DOCTYPE
[ https://issues.apache.org/jira/browse/YARN-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-3589: - Attachment: YARN-3589.PNG RM and AH web UI display DOCTYPE Key: YARN-3589 URL: https://issues.apache.org/jira/browse/YARN-3589 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.8.0 Reporter: Rohith Assignee: Rohith Attachments: YARN-3589.PNG RM web app UI display {{!DOCTYPE html PUBLIC -\/\/W3C\/\/DTD HTML 4.01\/\/EN http:\/\/www.w3.org\/TR\/html4\/strict.dtd}} which is not necessary. This is because, content of html page is escaped which result browser cant not parse it. Any content which is escaped should be with the HTML block , but doc type is above html which browser can't parse it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3590) Change the attribute maxRunningApps of FairSchudeler queue,it should be take effect for application in queue,but not
zhoulinlin created YARN-3590: Summary: Change the attribute maxRunningApps of FairSchudeler queue,it should be take effect for application in queue,but not Key: YARN-3590 URL: https://issues.apache.org/jira/browse/YARN-3590 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.5.2 Reporter: zhoulinlin Change the queue attribute maxRunningApps for FairSchuduler, and then refresh queues, it should effect the application in queue immediately,but not. To take effect, the condition is that another application move from this queue. For example:the maxRunningApps is 0, summbit a application A to this queue, it can't be run. Then change maxRunningApps from 0 to 2 and refresh quque, application A should be run,but not. If you summit another application B,when application B commplete, application A run . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2442) ResourceManager JMX UI does not give HA State
[ https://issues.apache.org/jira/browse/YARN-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-2442: - Attachment: 0001-YARN-2442.patch Attached the patch without test. Verified the patch in cluster retriving HAState using JMX. Kindly review the patch ResourceManager JMX UI does not give HA State - Key: YARN-2442 URL: https://issues.apache.org/jira/browse/YARN-2442 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.5.0, 2.6.0, 2.7.0 Reporter: Nishan Shetty Assignee: Rohith Labels: BB2015-05-TBR Attachments: 0001-YARN-2442.patch ResourceManager JMX UI can show the haState (INITIALIZING, ACTIVE, STANDBY, STOPPED) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2442) ResourceManager JMX UI does not give HA State
[ https://issues.apache.org/jira/browse/YARN-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-2442: - Labels: BB2015-05-TBR (was: ) ResourceManager JMX UI does not give HA State - Key: YARN-2442 URL: https://issues.apache.org/jira/browse/YARN-2442 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.5.0, 2.6.0, 2.7.0 Reporter: Nishan Shetty Assignee: Rohith Labels: BB2015-05-TBR Attachments: 0001-YARN-2442.patch ResourceManager JMX UI can show the haState (INITIALIZING, ACTIVE, STANDBY, STOPPED) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3547) FairScheduler: Apps that have no resource demand should not participate scheduling
[ https://issues.apache.org/jira/browse/YARN-3547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532131#comment-14532131 ] Xianyin Xin commented on YARN-3547: --- Thanks for your feedback [~vinodkv]. We use SchedulerApplicationAttempt.getAppAttemptResourceUsage().getPending(), not hasPendingResourceRequest(). In fact i don't quite understand what you mean. attemptResourceUsage records the resource information of an attempt, no matter it is scheduled by CS or Fair. Would you please explain your concern? FairScheduler: Apps that have no resource demand should not participate scheduling -- Key: YARN-3547 URL: https://issues.apache.org/jira/browse/YARN-3547 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Xianyin Xin Assignee: Xianyin Xin Labels: BB2015-05-TBR Attachments: YARN-3547.001.patch, YARN-3547.002.patch, YARN-3547.003.patch, YARN-3547.004.patch At present, all of the 'running' apps participate the scheduling process, however, most of them may have no resource demand on a production cluster, as the app's status is running other than waiting for resource at the most of the app's lifetime. It's not a wise way we sort all the 'running' apps and try to fulfill them, especially on a large-scale cluster which has heavy scheduling load. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3505) Node's Log Aggregation Report with SUCCEED should not cached in RMApps
[ https://issues.apache.org/jira/browse/YARN-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532167#comment-14532167 ] Hadoop QA commented on YARN-3505: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 37s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 30s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 36s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 13s | The applied patch generated 4 new checkstyle issues (total was 71, now 66). | | {color:red}-1{color} | whitespace | 0m 21s | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 37s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 9s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 0m 25s | Tests passed in hadoop-yarn-server-common. | | {color:green}+1{color} | yarn tests | 5m 50s | Tests passed in hadoop-yarn-server-nodemanager. | | {color:red}-1{color} | yarn tests | 62m 48s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 108m 28s | | \\ \\ || Reason || Tests || | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12731082/YARN-3505.2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 918af8e | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7755/artifact/patchprocess/diffcheckstylehadoop-yarn-server-common.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7755/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7755/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7755/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7755/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7755/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7755/console | This message was automatically generated. Node's Log Aggregation Report with SUCCEED should not cached in RMApps -- Key: YARN-3505 URL: https://issues.apache.org/jira/browse/YARN-3505 Project: Hadoop YARN Issue Type: Sub-task Components: log-aggregation Affects Versions: 2.8.0 Reporter: Junping Du Assignee: Xuan Gong Priority: Critical Attachments: YARN-3505.1.patch, YARN-3505.2.patch Per discussions in YARN-1402, we shouldn't cache all node's log aggregation reports in RMApps for always, especially for those finished with SUCCEED. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3589) RM and AH web UI display DOCTYPE
[ https://issues.apache.org/jira/browse/YARN-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532323#comment-14532323 ] Rohith commented on YARN-3589: -- YARN-1993 escapes the content which are written with in the html block which is required for cross-site scripting. But the same logic is being used for content outside the html block which browser does not parse it and dispaly as it is. RM and AH web UI display DOCTYPE Key: YARN-3589 URL: https://issues.apache.org/jira/browse/YARN-3589 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.8.0 Reporter: Rohith Assignee: Rohith Attachments: YARN-3589.PNG RM web app UI display {{!DOCTYPE html PUBLIC -\/\/W3C\/\/DTD HTML 4.01\/\/EN http:\/\/www.w3.org\/TR\/html4\/strict.dtd}} which is not necessary. This is because, content of html page is escaped which result browser cant not parse it. Any content which is escaped should be with the HTML block , but doc type is above html which browser can't parse it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2821: Attachment: YARN-2821.002.patch Uploaded a new patch that correctly handles AM restarts and doesn't launch unnecessary containers. Distributed shell app master becomes unresponsive sometimes --- Key: YARN-2821 URL: https://issues.apache.org/jira/browse/YARN-2821 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.5.1 Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: YARN-2821.002.patch, apache-yarn-2821.0.patch, apache-yarn-2821.1.patch We've noticed that once in a while the distributed shell app master becomes unresponsive and is eventually killed by the RM. snippet of the logs - {noformat} 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: appattempt_1415123350094_0017_01 received 0 previous attempts' running containers on AM registration. 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : onprem-tez2:45454 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_02, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez3:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez4:45454 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=3 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_03, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_04, containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_05, containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_03 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_05 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_04 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1415123350094_0017_01_05 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1415123350094_0017_01_03 14/11/04 18:21:39 INFO
[jira] [Updated] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3044: Attachment: YARN-3044-YARN-2928.006.patch Hi [~sjlee0] [~djp], Uploading a patch with applicable , check style issues, find bugs issue and test case failures (those which are due to this patch or branch). Hope to have some reviews faster for this jira and 3045, so that i can try some other work on 2928 [Event producers] Implement RM writing app lifecycle events to ATS -- Key: YARN-3044 URL: https://issues.apache.org/jira/browse/YARN-3044 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Labels: BB2015-05-TBR Attachments: YARN-3044-YARN-2928.004.patch, YARN-3044-YARN-2928.005.patch, YARN-3044-YARN-2928.006.patch, YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch, YARN-3044.20150416-1.patch Per design in YARN-2928, implement RM writing app lifecycle events to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3592) Fix typos in RMNodeLabelsManager
[ https://issues.apache.org/jira/browse/YARN-3592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533114#comment-14533114 ] Hadoop QA commented on YARN-3592: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 39s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 31s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 33s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 26s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 15s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:red}-1{color} | yarn tests | 54m 23s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 90m 21s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.TestRMRestart | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12731185/0001-YARN-3592.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 8e991f4 | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7763/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7763/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7763/console | This message was automatically generated. Fix typos in RMNodeLabelsManager Key: YARN-3592 URL: https://issues.apache.org/jira/browse/YARN-3592 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Junping Du Assignee: Sunil G Labels: newbie Attachments: 0001-YARN-3592.patch acccessibleNodeLabels = accessibleNodeLabels in many places. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2918) Don't fail RM if queue's configured labels are not existed in cluster-node-labels
[ https://issues.apache.org/jira/browse/YARN-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2918: - Attachment: YARN-2918.3.patch Attached ver.3, addressed comments from [~jianhe] Don't fail RM if queue's configured labels are not existed in cluster-node-labels - Key: YARN-2918 URL: https://issues.apache.org/jira/browse/YARN-2918 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Rohith Assignee: Wangda Tan Labels: BB2015-05-TBR Attachments: YARN-2918.1.patch, YARN-2918.2.patch, YARN-2918.3.patch Currently, if admin setup labels on queues {{queue-path.accessible-node-labels = ...}}. And the label is not added to RM, queue's initialization will fail and RM will fail too: {noformat} 2014-12-03 20:11:50,126 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager ... Caused by: java.io.IOException: NodeLabelManager doesn't include label = x, please check. at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkIfLabelInClusterNodeLabels(SchedulerUtils.java:287) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.init(AbstractCSQueue.java:109) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.init(LeafQueue.java:120) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:567) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:587) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:462) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:294) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:324) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) {noformat} This is not a good user experience, we should stop fail RM so that admin can configure queue/labels in following steps: - Configure queue (with label) - Start RM - Add labels to RM - Submit applications Now admin has to: - Configure queue (without label) - Start RM - Add labels to RM - Refresh queue's config (with label) - Submit applications -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2194) Add Cgroup support for RedHat 7
[ https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533256#comment-14533256 ] Wei Yan commented on YARN-2194: --- [~vinodkv], discussed about this with Karthik offline. We thought it would be better just let the existing cgroups code work well in RedHat7. And after that, we can provide another systemd solution. I can update a patch later. Add Cgroup support for RedHat 7 --- Key: YARN-2194 URL: https://issues.apache.org/jira/browse/YARN-2194 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-2194-1.patch In previous versions of RedHat, we can build custom cgroup hierarchies with use of the cgconfig command from the libcgroup package. From RedHat 7, package libcgroup is deprecated and it is not recommended to use it since it can easily create conflicts with the default cgroup hierarchy. The systemd is provided and recommended for cgroup management. We need to add support for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3460) Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM
[ https://issues.apache.org/jira/browse/YARN-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] pascal oliva updated YARN-3460: --- Attachment: YARN-3460-3.patch update whitespaces Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM Key: YARN-3460 URL: https://issues.apache.org/jira/browse/YARN-3460 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 2.6.0 Environment: $ mvn -version Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 2014-02-14T11:37:52-06:00) Maven home: /opt/apache-maven-3.2.1 Java version: 1.7.0, vendor: IBM Corporation Java home: /usr/lib/jvm/ibm-java-ppc64le-71/jre Default locale: en_US, platform encoding: UTF-8 OS name: linux, version: 3.10.0-229.ael7b.ppc64le, arch: ppc64le, family: unix Reporter: pascal oliva Attachments: HADOOP-11810-1.patch, YARN-3460-1.patch, YARN-3460-2.patch, YARN-3460-3.patch TestSecureRMRegistryOperations failed with JBM IBM JAVA mvn test -X -Dtest=org.apache.hadoop.registry.secure.TestSecureRMRegistryOperations ModuleTotal Failure Error Skipped - hadoop-yarn-registry 12 0 12 0 - Total 12 0 12 0 With javax.security.auth.login.LoginException: Bad JAAS configuration: unrecognized option: isInitiator and Bad JAAS configuration: unrecognized option: storeKey -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities
[ https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533313#comment-14533313 ] Jonathan Eagles commented on YARN-3448: --- Thanks so much, everyone! Add Rolling Time To Lives Level DB Plugin Capabilities -- Key: YARN-3448 URL: https://issues.apache.org/jira/browse/YARN-3448 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Jonathan Eagles Fix For: 2.8.0 Attachments: YARN-3448.1.patch, YARN-3448.10.patch, YARN-3448.12.patch, YARN-3448.13.patch, YARN-3448.14.patch, YARN-3448.15.patch, YARN-3448.16.patch, YARN-3448.17.patch, YARN-3448.2.patch, YARN-3448.3.patch, YARN-3448.4.patch, YARN-3448.5.patch, YARN-3448.7.patch, YARN-3448.8.patch, YARN-3448.9.patch For large applications, the majority of the time in LeveldbTimelineStore is spent deleting old entities record at a time. An exclusive write lock is held during the entire deletion phase which in practice can be hours. If we are to relax some of the consistency constraints, other performance enhancing techniques can be employed to maximize the throughput and minimize locking time. Split the 5 sections of the leveldb database (domain, owner, start time, entity, index) into 5 separate databases. This allows each database to maximize the read cache effectiveness based on the unique usage patterns of each database. With 5 separate databases each lookup is much faster. This can also help with I/O to have the entity and index databases on separate disks. Rolling DBs for entity and index DBs. 99.9% of the data are in these two sections 4:1 ration (index to entity) at least for tez. We replace DB record removal with file system removal if we create a rolling set of databases that age out and can be efficiently removed. To do this we must place a constraint to always place an entity's events into it's correct rolling db instance based on start time. This allows us to stitching the data back together while reading and artificial paging. Relax the synchronous writes constraints. If we are willing to accept losing some records that we not flushed in the operating system during a crash, we can use async writes that can be much faster. Prefer Sequential writes. sequential writes can be several times faster than random writes. Spend some small effort arranging the writes in such a way that will trend towards sequential write performance over random write performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2686) CgroupsLCEResourcesHandler does not support the default Redhat 7/CentOS 7
[ https://issues.apache.org/jira/browse/YARN-2686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-2686. --- Resolution: Duplicate Tx for the discussion. Closing as dup of YARN-2194. CgroupsLCEResourcesHandler does not support the default Redhat 7/CentOS 7 - Key: YARN-2686 URL: https://issues.apache.org/jira/browse/YARN-2686 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Beckham007 CgroupsLCEResourcesHandler uses , to seprating resourcesOption. Redhat 7 use /sys/fs/cgroup/cpu,cpuacct as the cpu mount dir. So container-executor would use the wrong path /sys/fs/cgroup/cpu as the container task file. It should be /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/contain_id/tasks. We should someother character instand of ,. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2194) Add Cgroup support for RedHat 7
[ https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533089#comment-14533089 ] Vinod Kumar Vavilapalli commented on YARN-2194: --- Also, it will be really useful if we could find a way to support existing code on RHEL7, even if libcgroups is deprecated and the approach ends up becoming heavy-handed, perhaps with some manual steps. This [page|https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Resource_Management_Guide/chap-Using_libcgroup_Tools.html] talks about ways of doing this. Add Cgroup support for RedHat 7 --- Key: YARN-2194 URL: https://issues.apache.org/jira/browse/YARN-2194 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-2194-1.patch In previous versions of RedHat, we can build custom cgroup hierarchies with use of the cgconfig command from the libcgroup package. From RedHat 7, package libcgroup is deprecated and it is not recommended to use it since it can easily create conflicts with the default cgroup hierarchy. The systemd is provided and recommended for cgroup management. We need to add support for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3362) Add node label usage in RM CapacityScheduler web UI
[ https://issues.apache.org/jira/browse/YARN-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533120#comment-14533120 ] Wangda Tan commented on YARN-3362: -- Hi [~Naganarasimha], Thanks a lot for updating, looks much better now! I still have few minor comments: *For UI:* 1) Configured Capacity, Configured Max Capacity should be a part of Queue Status for Partition...? 2) Not caused by your patch, Absolute Capacity should be Absolute Configured Capacity, Absolute Max Capacity should be Absolute Configured Max Capacity, could you update them in your patch? 3) Also not caused by your patch, there's no space between Active Users Info and following queue, I'm not sure if there's any easy fix can do, please feel free to file a separate ticket if it will be hard to be solved together. *For implementation:* 1) One minor comment for style is, you can merge all capacities-related rendering in CapacitySchedulerPage to a method similar to renderCommonLeafQueueInfo, which you can merge some implementation for {{render}} and {{renderLeafQueueInfoWithoutParition}}. And add a method renderLeafQueueInfoWithParition to make {{render}} looks cleaner. *For your question* bq. May be i did not get this completely. label is exclusive-label i have done in CapacitySchedulerPage.QueuesBlock.render(l num - 357) I think for both CapacitySchedulerPage and CapacitySchedulerInfo, it should be: {code} if (label.getIsExclusive() !((AbstractCSQueue) root).accessibleToPartition(label.getLabelName())) { {code} When label is exclusive (nobody can use the label except queue is accessible to the label) and (queue isn't accessible to the label), we don't need to continue. Let me know your thoughts. CC: [~vinodkv]/[~jianhe]. Add node label usage in RM CapacityScheduler web UI --- Key: YARN-3362 URL: https://issues.apache.org/jira/browse/YARN-3362 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager, webapp Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: 2015.05.06 Folded Queues.png, 2015.05.06 Queue Expanded.png, 2015.05.07_3362_Queue_Hierarchy.png, CSWithLabelsView.png, Screen Shot 2015-04-29 at 11.42.17 AM.png, YARN-3362.20150428-3-modified.patch, YARN-3362.20150428-3.patch, YARN-3362.20150506-1.patch, YARN-3362.20150507-1.patch, capacity-scheduler.xml We don't have node label usage in RM CapacityScheduler web UI now, without this, user will be hard to understand what happened to nodes have labels assign to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-644) Basic null check is not performed on passed in arguments before using them in ContainerManagerImpl.startContainer
[ https://issues.apache.org/jira/browse/YARN-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-644: -- Attachment: (was: YARN-644.03.patch) Basic null check is not performed on passed in arguments before using them in ContainerManagerImpl.startContainer - Key: YARN-644 URL: https://issues.apache.org/jira/browse/YARN-644 Project: Hadoop YARN Issue Type: Sub-task Reporter: Omkar Vinit Joshi Assignee: Varun Saxena Priority: Minor Labels: BB2015-05-TBR, newbie Attachments: YARN-644.001.patch, YARN-644.002.patch, YARN-644.03.patch I see that validation/ null check is not performed on passed in parameters. Ex. tokenId.getContainerID().getApplicationAttemptId() inside ContainerManagerImpl.authorizeRequest() I guess we should add these checks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-644) Basic null check is not performed on passed in arguments before using them in ContainerManagerImpl.startContainer
[ https://issues.apache.org/jira/browse/YARN-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-644: -- Attachment: YARN-644.03.patch Basic null check is not performed on passed in arguments before using them in ContainerManagerImpl.startContainer - Key: YARN-644 URL: https://issues.apache.org/jira/browse/YARN-644 Project: Hadoop YARN Issue Type: Sub-task Reporter: Omkar Vinit Joshi Assignee: Varun Saxena Priority: Minor Labels: BB2015-05-TBR, newbie Attachments: YARN-644.001.patch, YARN-644.002.patch, YARN-644.03.patch I see that validation/ null check is not performed on passed in parameters. Ex. tokenId.getContainerID().getApplicationAttemptId() inside ContainerManagerImpl.authorizeRequest() I guess we should add these checks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3362) Add node label usage in RM CapacityScheduler web UI
[ https://issues.apache.org/jira/browse/YARN-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533174#comment-14533174 ] Naganarasimha G R commented on YARN-3362: - Thanks for the review [~wangda], bq. Also not caused by your patch, there's no space between Active Users Info and following queue, I'm not sure if there's any easy fix can do, please feel free to file a separate ticket if it will be hard to be solved together. Put some considerable amount of effort, dint workout. Hamlet should have add some kind of doc or a book, its seems like an enigma code to me. Will file a separate jira as its required in many places of CS page : Active Users Info and following queue, in between Dump scheduler log button and Application Queue , In CS Health Block For other comments will rework ASAP. Add node label usage in RM CapacityScheduler web UI --- Key: YARN-3362 URL: https://issues.apache.org/jira/browse/YARN-3362 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager, webapp Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: 2015.05.06 Folded Queues.png, 2015.05.06 Queue Expanded.png, 2015.05.07_3362_Queue_Hierarchy.png, CSWithLabelsView.png, No-space-between-Active_user_info-and-next-queues.png, Screen Shot 2015-04-29 at 11.42.17 AM.png, YARN-3362.20150428-3-modified.patch, YARN-3362.20150428-3.patch, YARN-3362.20150506-1.patch, YARN-3362.20150507-1.patch, capacity-scheduler.xml We don't have node label usage in RM CapacityScheduler web UI now, without this, user will be hard to understand what happened to nodes have labels assign to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-644) Basic null check is not performed on passed in arguments before using them in ContainerManagerImpl.startContainer
[ https://issues.apache.org/jira/browse/YARN-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-644: -- Attachment: YARN-644.03.patch Basic null check is not performed on passed in arguments before using them in ContainerManagerImpl.startContainer - Key: YARN-644 URL: https://issues.apache.org/jira/browse/YARN-644 Project: Hadoop YARN Issue Type: Sub-task Reporter: Omkar Vinit Joshi Assignee: Varun Saxena Priority: Minor Labels: BB2015-05-TBR, newbie Attachments: YARN-644.001.patch, YARN-644.002.patch, YARN-644.03.patch I see that validation/ null check is not performed on passed in parameters. Ex. tokenId.getContainerID().getApplicationAttemptId() inside ContainerManagerImpl.authorizeRequest() I guess we should add these checks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-644) Basic null check is not performed on passed in arguments before using them in ContainerManagerImpl.startContainer
[ https://issues.apache.org/jira/browse/YARN-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-644: -- Attachment: (was: YARN-644.03.patch) Basic null check is not performed on passed in arguments before using them in ContainerManagerImpl.startContainer - Key: YARN-644 URL: https://issues.apache.org/jira/browse/YARN-644 Project: Hadoop YARN Issue Type: Sub-task Reporter: Omkar Vinit Joshi Assignee: Varun Saxena Priority: Minor Labels: BB2015-05-TBR, newbie Attachments: YARN-644.001.patch, YARN-644.002.patch, YARN-644.03.patch I see that validation/ null check is not performed on passed in parameters. Ex. tokenId.getContainerID().getApplicationAttemptId() inside ContainerManagerImpl.authorizeRequest() I guess we should add these checks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2918) Don't fail RM if queue's configured labels are not existed in cluster-node-labels
[ https://issues.apache.org/jira/browse/YARN-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533179#comment-14533179 ] Jian He commented on YARN-2918: --- looks good overall, minor comments: - simplify below a bit? {code} boolean queueCheck = true; if (queueLabels == null) { queueCheck = false; } else { if (!queueLabels.contains(str) !queueLabels.contains(RMNodeLabelsManager.ANY)) { queueCheck = false; } } if (!queueCheck) { return false; } {code} - test newly added test conditions in TestSchedulerUtils seems like some of them are not being tested. Don't fail RM if queue's configured labels are not existed in cluster-node-labels - Key: YARN-2918 URL: https://issues.apache.org/jira/browse/YARN-2918 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Rohith Assignee: Wangda Tan Labels: BB2015-05-TBR Attachments: YARN-2918.1.patch, YARN-2918.2.patch Currently, if admin setup labels on queues {{queue-path.accessible-node-labels = ...}}. And the label is not added to RM, queue's initialization will fail and RM will fail too: {noformat} 2014-12-03 20:11:50,126 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager ... Caused by: java.io.IOException: NodeLabelManager doesn't include label = x, please check. at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkIfLabelInClusterNodeLabels(SchedulerUtils.java:287) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.init(AbstractCSQueue.java:109) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.init(LeafQueue.java:120) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:567) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:587) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:462) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:294) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:324) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) {noformat} This is not a good user experience, we should stop fail RM so that admin can configure queue/labels in following steps: - Configure queue (with label) - Start RM - Add labels to RM - Submit applications Now admin has to: - Configure queue (without label) - Start RM - Add labels to RM - Refresh queue's config (with label) - Submit applications -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3539) Compatibility doc to state that ATS v1 is a stable REST API
[ https://issues.apache.org/jira/browse/YARN-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533208#comment-14533208 ] Hadoop QA commented on YARN-3539: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 35s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 30s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 21s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | site | 2m 57s | Site still builds. | | {color:green}+1{color} | checkstyle | 3m 27s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 1m 38s | The patch has 18 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 31s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 5m 15s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | common tests | 23m 22s | Tests passed in hadoop-common. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 56s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-server-common. | | | | 76m 42s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12731223/YARN-3539-008.patch | | Optional Tests | site javadoc javac unit findbugs checkstyle | | git revision | trunk / daf3e4e | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7768/artifact/patchprocess/whitespace.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7768/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/7768/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7768/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7768/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7768/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7768/console | This message was automatically generated. Compatibility doc to state that ATS v1 is a stable REST API --- Key: YARN-3539 URL: https://issues.apache.org/jira/browse/YARN-3539 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.7.0 Reporter: Steve Loughran Assignee: Steve Loughran Labels: BB2015-05-TBR Attachments: HADOOP-11826-001.patch, HADOOP-11826-002.patch, YARN-3539-003.patch, YARN-3539-004.patch, YARN-3539-005.patch, YARN-3539-006.patch, YARN-3539-007.patch, YARN-3539-008.patch, timeline_get_api_examples.txt The ATS v2 discussion and YARN-2423 have raised the question: how stable are the ATSv1 APIs? The existing compatibility document actually states that the History Server is [a stable REST API|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#REST_APIs], which effectively means that ATSv1 has already been declared as a stable API. Clarify this by patching the compatibility document appropriately -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3589) RM and AH web UI display DOCTYPE
[ https://issues.apache.org/jira/browse/YARN-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533206#comment-14533206 ] Jian He commented on YARN-3589: --- lgtm, thanks Rohith ! RM and AH web UI display DOCTYPE Key: YARN-3589 URL: https://issues.apache.org/jira/browse/YARN-3589 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.8.0 Reporter: Rohith Assignee: Rohith Labels: BB2015-05-TBR Attachments: 0001-YARN-3589.patch, YARN-3589.PNG RM web app UI display {{!DOCTYPE html PUBLIC -\/\/W3C\/\/DTD HTML 4.01\/\/EN http:\/\/www.w3.org\/TR\/html4\/strict.dtd}} which is not necessary. This is because, content of html page is escaped which result browser cant not parse it. Any content which is escaped should be with the HTML block , but doc type is above html which browser can't parse it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3593) Modifications in Node Labels Page for Partitions
[ https://issues.apache.org/jira/browse/YARN-3593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3593: Attachment: YARN-3593.20150507-1.patch NodeLabelsPageModifications.png attaching initial patch and the snapshot for modifications Modifications in Node Labels Page for Partitions Key: YARN-3593 URL: https://issues.apache.org/jira/browse/YARN-3593 Project: Hadoop YARN Issue Type: Sub-task Components: webapp Affects Versions: 2.6.0, 2.7.0 Reporter: Naganarasimha G R Assignee: Naganarasimha G R Attachments: NodeLabelsPageModifications.png, YARN-3593.20150507-1.patch Need to support displaying the label type in Node Labels page and also instead of NO_LABEL we need to show as DEFAULT_PARTITION -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3362) Add node label usage in RM CapacityScheduler web UI
[ https://issues.apache.org/jira/browse/YARN-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3362: - Attachment: No-space-between-Active_user_info-and-next-queues.png Add node label usage in RM CapacityScheduler web UI --- Key: YARN-3362 URL: https://issues.apache.org/jira/browse/YARN-3362 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager, webapp Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: 2015.05.06 Folded Queues.png, 2015.05.06 Queue Expanded.png, 2015.05.07_3362_Queue_Hierarchy.png, CSWithLabelsView.png, No-space-between-Active_user_info-and-next-queues.png, Screen Shot 2015-04-29 at 11.42.17 AM.png, YARN-3362.20150428-3-modified.patch, YARN-3362.20150428-3.patch, YARN-3362.20150506-1.patch, YARN-3362.20150507-1.patch, capacity-scheduler.xml We don't have node label usage in RM CapacityScheduler web UI now, without this, user will be hard to understand what happened to nodes have labels assign to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3592) Fix typos in RMNodeLabelsManager
[ https://issues.apache.org/jira/browse/YARN-3592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533159#comment-14533159 ] Sunil G commented on YARN-3592: --- testRMRestartGetApplicationList is passing locally. Test failure is unrelated. New test cases are not needed for this as its a typo issue. Fix typos in RMNodeLabelsManager Key: YARN-3592 URL: https://issues.apache.org/jira/browse/YARN-3592 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Junping Du Assignee: Sunil G Labels: newbie Attachments: 0001-YARN-3592.patch acccessibleNodeLabels = accessibleNodeLabels in many places. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-644) Basic null check is not performed on passed in arguments before using them in ContainerManagerImpl.startContainer
[ https://issues.apache.org/jira/browse/YARN-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-644: -- Attachment: YARN-644.03.patch Basic null check is not performed on passed in arguments before using them in ContainerManagerImpl.startContainer - Key: YARN-644 URL: https://issues.apache.org/jira/browse/YARN-644 Project: Hadoop YARN Issue Type: Sub-task Reporter: Omkar Vinit Joshi Assignee: Varun Saxena Priority: Minor Labels: BB2015-05-TBR, newbie Attachments: YARN-644.001.patch, YARN-644.002.patch, YARN-644.03.patch I see that validation/ null check is not performed on passed in parameters. Ex. tokenId.getContainerID().getApplicationAttemptId() inside ContainerManagerImpl.authorizeRequest() I guess we should add these checks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2268) Disallow formatting the RMStateStore when there is an RM running
[ https://issues.apache.org/jira/browse/YARN-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533311#comment-14533311 ] Xuan Gong commented on YARN-2268: - Cancel the patch, looks like we need more discussion Disallow formatting the RMStateStore when there is an RM running Key: YARN-2268 URL: https://issues.apache.org/jira/browse/YARN-2268 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Rohith Labels: BB2015-05-TBR Attachments: 0001-YARN-2268.patch YARN-2131 adds a way to format the RMStateStore. However, it can be a problem if we format the store while an RM is actively using it. It would be nice to fail the format if there is an RM running and using this store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3460) Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM
[ https://issues.apache.org/jira/browse/YARN-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533310#comment-14533310 ] Hadoop QA commented on YARN-3460: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 41s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 36s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 41s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 22s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 38s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 1m 1s | Tests passed in hadoop-yarn-registry. | | | | 36m 34s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12731250/YARN-3460-3.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / ab5058d | | hadoop-yarn-registry test log | https://builds.apache.org/job/PreCommit-YARN-Build/7774/artifact/patchprocess/testrun_hadoop-yarn-registry.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7774/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7774/console | This message was automatically generated. Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM Key: YARN-3460 URL: https://issues.apache.org/jira/browse/YARN-3460 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 2.6.0 Environment: $ mvn -version Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 2014-02-14T11:37:52-06:00) Maven home: /opt/apache-maven-3.2.1 Java version: 1.7.0, vendor: IBM Corporation Java home: /usr/lib/jvm/ibm-java-ppc64le-71/jre Default locale: en_US, platform encoding: UTF-8 OS name: linux, version: 3.10.0-229.ael7b.ppc64le, arch: ppc64le, family: unix Reporter: pascal oliva Attachments: HADOOP-11810-1.patch, YARN-3460-1.patch, YARN-3460-2.patch, YARN-3460-3.patch TestSecureRMRegistryOperations failed with JBM IBM JAVA mvn test -X -Dtest=org.apache.hadoop.registry.secure.TestSecureRMRegistryOperations ModuleTotal Failure Error Skipped - hadoop-yarn-registry 12 0 12 0 - Total 12 0 12 0 With javax.security.auth.login.LoginException: Bad JAAS configuration: unrecognized option: isInitiator and Bad JAAS configuration: unrecognized option: storeKey -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-20) More information for yarn.resourcemanager.webapp.address in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-20?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533074#comment-14533074 ] Hadoop QA commented on YARN-20: --- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 39s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 33s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 34s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | yarn tests | 1m 56s | Tests passed in hadoop-yarn-common. | | | | 36m 13s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12731219/YARN-20.2.patch | | Optional Tests | javadoc javac unit | | git revision | trunk / daf3e4e | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7767/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7767/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7767/console | This message was automatically generated. More information for yarn.resourcemanager.webapp.address in yarn-default.xml -- Key: YARN-20 URL: https://issues.apache.org/jira/browse/YARN-20 Project: Hadoop YARN Issue Type: Improvement Components: documentation, resourcemanager Affects Versions: 2.0.0-alpha Reporter: Nemon Lou Assignee: Bartosz Ługowski Priority: Trivial Labels: BB2015-05-TBR, newbie Attachments: YARN-20.1.patch, YARN-20.2.patch, YARN-20.patch Original Estimate: 1h Remaining Estimate: 1h The parameter yarn.resourcemanager.webapp.address in yarn-default.xml is in host:port format,which is noted in the cluster set up guide (http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html). When i read though the code,i find host format is also supported. In host format,the port will be random. So we may add more documentation in yarn-default.xml for easy understood. I will submit a patch if it's helpful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3362) Add node label usage in RM CapacityScheduler web UI
[ https://issues.apache.org/jira/browse/YARN-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533122#comment-14533122 ] Wangda Tan commented on YARN-3362: -- Attached https://issues.apache.org/jira/secure/attachment/12731236/No-space-between-Active_user_info-and-next-queues.png; to showing there's no space between Active User Info and next queue. Add node label usage in RM CapacityScheduler web UI --- Key: YARN-3362 URL: https://issues.apache.org/jira/browse/YARN-3362 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager, webapp Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: 2015.05.06 Folded Queues.png, 2015.05.06 Queue Expanded.png, 2015.05.07_3362_Queue_Hierarchy.png, CSWithLabelsView.png, No-space-between-Active_user_info-and-next-queues.png, Screen Shot 2015-04-29 at 11.42.17 AM.png, YARN-3362.20150428-3-modified.patch, YARN-3362.20150428-3.patch, YARN-3362.20150506-1.patch, YARN-3362.20150507-1.patch, capacity-scheduler.xml We don't have node label usage in RM CapacityScheduler web UI now, without this, user will be hard to understand what happened to nodes have labels assign to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation
[ https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-3434: Attachment: YARN-3434-branch2.7.patch Attaching patch for branch2.7. [~leftnoteasy] could you take a look when you have a chance? Interaction between reservations and userlimit can result in significant ULF violation -- Key: YARN-3434 URL: https://issues.apache.org/jira/browse/YARN-3434 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.6.0 Reporter: Thomas Graves Assignee: Thomas Graves Fix For: 2.8.0 Attachments: YARN-3434-branch2.7.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch ULF was set to 1.0 User was able to consume 1.4X queue capacity. It looks like when this application launched, it reserved about 1000 containers, each 8G each, within about 5 seconds. I think this allowed the logic in assignToUser() to allow the userlimit to be surpassed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3505) Node's Log Aggregation Report with SUCCEED should not cached in RMApps
[ https://issues.apache.org/jira/browse/YARN-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533163#comment-14533163 ] Hadoop QA commented on YARN-3505: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 7s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 41s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 50s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 16s | The applied patch generated 3 new checkstyle issues (total was 70, now 64). | | {color:green}+1{color} | whitespace | 0m 17s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 38s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 12s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-server-common. | | {color:red}-1{color} | yarn tests | 5m 58s | Tests failed in hadoop-yarn-server-nodemanager. | | {color:red}-1{color} | yarn tests | 63m 52s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 110m 37s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesApps | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12731215/YARN-3505.2.rebase.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 8e991f4 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7764/artifact/patchprocess/diffcheckstylehadoop-yarn-server-common.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7764/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7764/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7764/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7764/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7764/console | This message was automatically generated. Node's Log Aggregation Report with SUCCEED should not cached in RMApps -- Key: YARN-3505 URL: https://issues.apache.org/jira/browse/YARN-3505 Project: Hadoop YARN Issue Type: Sub-task Components: log-aggregation Affects Versions: 2.8.0 Reporter: Junping Du Assignee: Xuan Gong Priority: Critical Attachments: YARN-3505.1.patch, YARN-3505.2.patch, YARN-3505.2.rebase.patch Per discussions in YARN-1402, we shouldn't cache all node's log aggregation reports in RMApps for always, especially for those finished with SUCCEED. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3362) Add node label usage in RM CapacityScheduler web UI
[ https://issues.apache.org/jira/browse/YARN-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533187#comment-14533187 ] Wangda Tan commented on YARN-3362: -- bq. Put some considerable amount of effort, dint workout. Hamlet should have add some kind of doc or a book, its seems like an enigma code to me. Will file a separate jira as its required in many places of CS page : Active Users Info and following queue, in between Dump scheduler log button and Application Queue , In CS Health Block Thanks for looking at this, that sounds good, we can address of them together in a separated JIRA. Add node label usage in RM CapacityScheduler web UI --- Key: YARN-3362 URL: https://issues.apache.org/jira/browse/YARN-3362 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager, webapp Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: 2015.05.06 Folded Queues.png, 2015.05.06 Queue Expanded.png, 2015.05.07_3362_Queue_Hierarchy.png, CSWithLabelsView.png, No-space-between-Active_user_info-and-next-queues.png, Screen Shot 2015-04-29 at 11.42.17 AM.png, YARN-3362.20150428-3-modified.patch, YARN-3362.20150428-3.patch, YARN-3362.20150506-1.patch, YARN-3362.20150507-1.patch, capacity-scheduler.xml We don't have node label usage in RM CapacityScheduler web UI now, without this, user will be hard to understand what happened to nodes have labels assign to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3581) Deprecate -directlyAccessNodeLabelStore in RMAdminCLI
[ https://issues.apache.org/jira/browse/YARN-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533156#comment-14533156 ] Wangda Tan commented on YARN-3581: -- Hi [~Naganarasimha], Thanks for trying and thinking about this, assigning to you, please go ahead. #1/#3/#4 make sense to me For #2, good point, I think we need add {{..}} for {{label(...),label(...)}}, like {{label(...),label(...)}}. This should in usage/help message as well. I'm open to change ( ) to [ ], but I'm not sure if other shell requires escape [..] too. For now I prefer to keep (...), not adding new syntax, but document it in help message. In addition, similar to #2, -replaceLabelsOnNode has problem when admin specified multiple host= in a same line without using ... I think we should add using ... in help message as well. If admin forgot to add ..., we should handle it instead of ignore host= after first one. Let me know your thoughts. Thanks, Wangda Deprecate -directlyAccessNodeLabelStore in RMAdminCLI - Key: YARN-3581 URL: https://issues.apache.org/jira/browse/YARN-3581 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan In 2.6.0, we added an option called -directlyAccessNodeLabelStore to make RM can start with label-configured queue settings. After YARN-2918, we don't need this option any more, admin can configure queue setting, start RM and configure node label via RMAdminCLI without any error. In addition, this option is very restrictive, first it needs to run on the same node where RM is running if admin configured to store labels in local disk. Second, when admin run the option when RM is running, multiple process write to a same file can happen, this could make node label store becomes invalid. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3581) Deprecate -directlyAccessNodeLabelStore in RMAdminCLI
[ https://issues.apache.org/jira/browse/YARN-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3581: - Assignee: Naganarasimha G R (was: Wangda Tan) Deprecate -directlyAccessNodeLabelStore in RMAdminCLI - Key: YARN-3581 URL: https://issues.apache.org/jira/browse/YARN-3581 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R In 2.6.0, we added an option called -directlyAccessNodeLabelStore to make RM can start with label-configured queue settings. After YARN-2918, we don't need this option any more, admin can configure queue setting, start RM and configure node label via RMAdminCLI without any error. In addition, this option is very restrictive, first it needs to run on the same node where RM is running if admin configured to store labels in local disk. Second, when admin run the option when RM is running, multiple process write to a same file can happen, this could make node label store becomes invalid. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3593) Modifications in Node Labels Page for Partitions
[ https://issues.apache.org/jira/browse/YARN-3593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533234#comment-14533234 ] Hadoop QA commented on YARN-3593: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 39s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 33s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 45s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 16s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 53m 40s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 89m 57s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12731228/YARN-3593.20150507-1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / daf3e4e | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7769/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7769/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7769/console | This message was automatically generated. Modifications in Node Labels Page for Partitions Key: YARN-3593 URL: https://issues.apache.org/jira/browse/YARN-3593 Project: Hadoop YARN Issue Type: Sub-task Components: webapp Affects Versions: 2.6.0, 2.7.0 Reporter: Naganarasimha G R Assignee: Naganarasimha G R Attachments: NodeLabelsPageModifications.png, YARN-3593.20150507-1.patch Need to support displaying the label type in Node Labels page and also instead of NO_LABEL we need to show as DEFAULT_PARTITION -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533233#comment-14533233 ] Hadoop QA commented on YARN-2821: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 49s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 35s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 22s | The applied patch generated 6 new checkstyle issues (total was 151, now 155). | | {color:green}+1{color} | whitespace | 0m 2s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 35s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:red}-1{color} | yarn tests | 15m 25s | Tests failed in hadoop-yarn-applications-distributedshell. | | | | 50m 56s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.applications.distributedshell.TestDistributedShellWithNodeLabels | | Timed out tests | org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12731235/YARN-2821.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / daf3e4e | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7770/artifact/patchprocess/diffcheckstylehadoop-yarn-applications-distributedshell.txt | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/7770/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7770/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7770/console | This message was automatically generated. Distributed shell app master becomes unresponsive sometimes --- Key: YARN-2821 URL: https://issues.apache.org/jira/browse/YARN-2821 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.5.1 Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: YARN-2821.002.patch, apache-yarn-2821.0.patch, apache-yarn-2821.1.patch We've noticed that once in a while the distributed shell app master becomes unresponsive and is eventually killed by the RM. snippet of the logs - {noformat} 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: appattempt_1415123350094_0017_01 received 0 previous attempts' running containers on AM registration. 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : onprem-tez2:45454 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_02, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_02 14/11/04
[jira] [Commented] (YARN-644) Basic null check is not performed on passed in arguments before using them in ContainerManagerImpl.startContainer
[ https://issues.apache.org/jira/browse/YARN-644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533304#comment-14533304 ] Varun Saxena commented on YARN-644: --- Fixed the tab issue and added tests. Basic null check is not performed on passed in arguments before using them in ContainerManagerImpl.startContainer - Key: YARN-644 URL: https://issues.apache.org/jira/browse/YARN-644 Project: Hadoop YARN Issue Type: Sub-task Reporter: Omkar Vinit Joshi Assignee: Varun Saxena Priority: Minor Labels: BB2015-05-TBR, newbie Attachments: YARN-644.001.patch, YARN-644.002.patch, YARN-644.03.patch I see that validation/ null check is not performed on passed in parameters. Ex. tokenId.getContainerID().getApplicationAttemptId() inside ContainerManagerImpl.authorizeRequest() I guess we should add these checks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2194) Add Cgroup support for RedHat 7
[ https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533075#comment-14533075 ] Vinod Kumar Vavilapalli commented on YARN-2194: --- Is there no API that we can use instead of spawning shells to set this up? We should have some auto-detection to chose the right plugin for the right OS. IAC, YARN-3443 changed the way resource isolation code is organized in the ResourceManager. YARN-3542 is migrating existing cgroups+cpu support to the new layout. We need to relook at this patch in light of those changes. Add Cgroup support for RedHat 7 --- Key: YARN-2194 URL: https://issues.apache.org/jira/browse/YARN-2194 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-2194-1.patch In previous versions of RedHat, we can build custom cgroup hierarchies with use of the cgconfig command from the libcgroup package. From RedHat 7, package libcgroup is deprecated and it is not recommended to use it since it can easily create conflicts with the default cgroup hierarchy. The systemd is provided and recommended for cgroup management. We need to add support for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2918) Don't fail RM if queue's configured labels are not existed in cluster-node-labels
[ https://issues.apache.org/jira/browse/YARN-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533199#comment-14533199 ] Wangda Tan commented on YARN-2918: -- In addition, Test failure of TestFifoScheduler is not related to the patch (ran it locally). Don't fail RM if queue's configured labels are not existed in cluster-node-labels - Key: YARN-2918 URL: https://issues.apache.org/jira/browse/YARN-2918 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Rohith Assignee: Wangda Tan Labels: BB2015-05-TBR Attachments: YARN-2918.1.patch, YARN-2918.2.patch, YARN-2918.3.patch Currently, if admin setup labels on queues {{queue-path.accessible-node-labels = ...}}. And the label is not added to RM, queue's initialization will fail and RM will fail too: {noformat} 2014-12-03 20:11:50,126 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager ... Caused by: java.io.IOException: NodeLabelManager doesn't include label = x, please check. at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkIfLabelInClusterNodeLabels(SchedulerUtils.java:287) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.init(AbstractCSQueue.java:109) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.init(LeafQueue.java:120) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:567) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:587) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:462) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:294) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:324) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) {noformat} This is not a good user experience, we should stop fail RM so that admin can configure queue/labels in following steps: - Configure queue (with label) - Start RM - Add labels to RM - Submit applications Now admin has to: - Configure queue (without label) - Start RM - Add labels to RM - Refresh queue's config (with label) - Submit applications -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3480) Make AM max attempts stored in RMAppImpl and RMStateStore to be configurable
[ https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533389#comment-14533389 ] Jian He commented on YARN-3480: --- [~hex108], thanks for your explanations. I can see this will be a problem for long running apps, as the number of attempts will just keep increasing. To be consistent with the attemptFailureValidityWindow for long running apps, instead of introducing a global hard limit, How about removing the attempt records that's beyond the validity window ? Make AM max attempts stored in RMAppImpl and RMStateStore to be configurable Key: YARN-3480 URL: https://issues.apache.org/jira/browse/YARN-3480 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jun Gong Assignee: Jun Gong Attachments: YARN-3480.01.patch, YARN-3480.02.patch, YARN-3480.03.patch When RM HA is enabled and running containers are kept across attempts, apps are more likely to finish successfully with more retries(attempts), so it will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make RM recover process much slower. It might be better to set max attempts to be stored in RMStateStore. BTW: When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to a small value, retried attempts might be very large. So we need to delete some attempts stored in RMStateStore and RMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3597) Fix the javadoc of DelegationTokenSecretManager in hadoop-hdfs
[ https://issues.apache.org/jira/browse/YARN-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Liptak updated YARN-3597: --- Attachment: YARN-3597.patch Fix the javadoc of DelegationTokenSecretManager in hadoop-hdfs -- Key: YARN-3597 URL: https://issues.apache.org/jira/browse/YARN-3597 Project: Hadoop YARN Issue Type: Sub-task Components: documentation Reporter: Gabor Liptak Priority: Trivial Attachments: YARN-3597.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1329) yarn-config.sh overwrites YARN_CONF_DIR indiscriminately
[ https://issues.apache.org/jira/browse/YARN-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533468#comment-14533468 ] Li Lu commented on YARN-1329: - Quick note: this patch no longer applies to trunk. The current yarn-config.sh does not perform {{export YARN_CONF_DIR}}. We may want to verify if the problem still exists. yarn-config.sh overwrites YARN_CONF_DIR indiscriminately - Key: YARN-1329 URL: https://issues.apache.org/jira/browse/YARN-1329 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager Reporter: Aaron Gottlieb Assignee: haosdent Labels: BB2015-05-TBR, easyfix Attachments: YARN-1329.patch The script yarn-daemons.sh calls {code}${HADOOP_LIBEXEC_DIR}/yarn-config.sh{code} yarn-config.sh overwrites any previously set value of environment variable YARN_CONF_DIR starting at line 40: {code:title=yarn-config.sh|borderStyle=solid} #check to see if the conf dir is given as an optional argument if [ $# -gt 1 ] then if [ --config = $1 ] then shift confdir=$1 shift YARN_CONF_DIR=$confdir fi fi # Allow alternate conf dir location. export YARN_CONF_DIR=${HADOOP_CONF_DIR:-$HADOOP_YARN_HOME/conf} {code} The last line should check for the existence of YARN_CONF_DIR first. {code} DEFAULT_CONF_DIR=${HADOOP_CONF_DIR:-$YARN_HOME/conf} export YARN_CONF_DIR=${YARN_CONF_DIR:-$DEFAULT_CONF_DIR} {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2421) CapacityScheduler still allocates containers to an app in the FINISHING state
[ https://issues.apache.org/jira/browse/YARN-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533524#comment-14533524 ] Chang Li commented on YARN-2421: updated my patch, the current approach ignores allocate requests when app attempt is in final_saving, finishing or isAppFinalStateStored CapacityScheduler still allocates containers to an app in the FINISHING state - Key: YARN-2421 URL: https://issues.apache.org/jira/browse/YARN-2421 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.4.1 Reporter: Thomas Graves Assignee: Chang Li Attachments: YARN-2421.4.patch, yarn2421.patch, yarn2421.patch, yarn2421.patch I saw an instance of a bad application master where it unregistered with the RM but then continued to call into allocate. The RMAppAttempt went to the FINISHING state, but the capacity scheduler kept allocating it containers. We should probably have the capacity scheduler check that the application isn't in one of the terminal states before giving it containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3596) Fix the javadoc of DelegationTokenSecretManager in hadoop-common
[ https://issues.apache.org/jira/browse/YARN-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533544#comment-14533544 ] Hadoop QA commented on YARN-3596: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 12s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 40s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 1s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 21s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 9s | The applied patch generated 1 new checkstyle issues (total was 49, now 50). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 45s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | common tests | 23m 49s | Tests passed in hadoop-common. | | | | 62m 7s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12731284/YARN-3596.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / b88700d | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7781/artifact/patchprocess/diffcheckstylehadoop-common.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7781/artifact/patchprocess/testrun_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7781/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7781/console | This message was automatically generated. Fix the javadoc of DelegationTokenSecretManager in hadoop-common Key: YARN-3596 URL: https://issues.apache.org/jira/browse/YARN-3596 Project: Hadoop YARN Issue Type: Sub-task Components: documentation Reporter: Gabor Liptak Priority: Trivial Attachments: YARN-3596.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3600) AM container link is broken (on a killed application, at least)
[ https://issues.apache.org/jira/browse/YARN-3600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533571#comment-14533571 ] Sergey Shelukhin commented on YARN-3600: [~vinodkv] ran into this recently... can you guys take a look AM container link is broken (on a killed application, at least) --- Key: YARN-3600 URL: https://issues.apache.org/jira/browse/YARN-3600 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.8.0 Reporter: Sergey Shelukhin Running some fairly recent (couple weeks ago) version of 2.8.0-SNAPSHOT. I have an application that ran fine for a while and then I yarn kill-ed it. Now when I go to the only app attempt URL (like so: http://(snip RM host name):8088/cluster/appattempt/appattempt_1429683757595_0795_01) I see: AM Container: container_1429683757595_0795_01_01 Node: N/A and the container link goes to {noformat}http://(snip RM host name):8088/cluster/N/A {noformat} which obviously doesn't work -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3600) AM container link is broken (on a killed application, at least)
[ https://issues.apache.org/jira/browse/YARN-3600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated YARN-3600: --- Affects Version/s: 2.8.0 AM container link is broken (on a killed application, at least) --- Key: YARN-3600 URL: https://issues.apache.org/jira/browse/YARN-3600 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.8.0 Reporter: Sergey Shelukhin Running some fairly recent (couple weeks ago) version of 2.8.0-SNAPSHOT. I have an application that ran fine for a while and then I yarn kill-ed it. Now when I go to the only app attempt URL (like so: http://(snip RM host name):8088/cluster/appattempt/appattempt_1429683757595_0795_01) I see: AM Container: container_1429683757595_0795_01_01 Node: N/A and the container link goes to {noformat}http://(snip RM host name):8088/cluster/N/A {noformat} which obviously doesn't work -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3600) AM container link is broken (on a killed application, at least)
[ https://issues.apache.org/jira/browse/YARN-3600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated YARN-3600: --- Description: Running some fairly recent (couple weeks ago) version of 2.8.0-SNAPSHOT. I have an application that ran fine for a while and then I yarn kill-ed it. Now when I go to the only app attempt URL (like so: http://(snip RM host name):8088/cluster/appattempt/appattempt_1429683757595_0795_01) I see: AM Container: container_1429683757595_0795_01_01 Node: N/A and the container URL is {noformat}http://(snip RM host name):8088/cluster/N/A {noformat} which obviously doesn't work was: Running some fairly recent (couple weeks ago) version of 2.8.0-SNAPSHOT. I have an application that ran fine for a while and then I yarn kill-ed it. Now when I go to the only app attempt URL (like so: http://(snip):8088/cluster/appattempt/appattempt_1429683757595_0795_01) I see: AM Container: container_1429683757595_0795_01_01 Node: N/A and the container URL is {noformat}http://cn042-10.l42scl.hortonworks.com:8088/cluster/N/A {noformat} which obviously doesn't work AM container link is broken (on a killed application, at least) --- Key: YARN-3600 URL: https://issues.apache.org/jira/browse/YARN-3600 Project: Hadoop YARN Issue Type: Bug Reporter: Sergey Shelukhin Running some fairly recent (couple weeks ago) version of 2.8.0-SNAPSHOT. I have an application that ran fine for a while and then I yarn kill-ed it. Now when I go to the only app attempt URL (like so: http://(snip RM host name):8088/cluster/appattempt/appattempt_1429683757595_0795_01) I see: AM Container: container_1429683757595_0795_01_01 Node: N/A and the container URL is {noformat}http://(snip RM host name):8088/cluster/N/A {noformat} which obviously doesn't work -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3600) AM container link is broken (on a killed application, at least)
[ https://issues.apache.org/jira/browse/YARN-3600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated YARN-3600: --- Description: Running some fairly recent (couple weeks ago) version of 2.8.0-SNAPSHOT. I have an application that ran fine for a while and then I yarn kill-ed it. Now when I go to the only app attempt URL (like so: http://(snip RM host name):8088/cluster/appattempt/appattempt_1429683757595_0795_01) I see: AM Container: container_1429683757595_0795_01_01 Node: N/A and the container link goes to {noformat}http://(snip RM host name):8088/cluster/N/A {noformat} which obviously doesn't work was: Running some fairly recent (couple weeks ago) version of 2.8.0-SNAPSHOT. I have an application that ran fine for a while and then I yarn kill-ed it. Now when I go to the only app attempt URL (like so: http://(snip RM host name):8088/cluster/appattempt/appattempt_1429683757595_0795_01) I see: AM Container: container_1429683757595_0795_01_01 Node: N/A and the container URL is {noformat}http://(snip RM host name):8088/cluster/N/A {noformat} which obviously doesn't work AM container link is broken (on a killed application, at least) --- Key: YARN-3600 URL: https://issues.apache.org/jira/browse/YARN-3600 Project: Hadoop YARN Issue Type: Bug Reporter: Sergey Shelukhin Running some fairly recent (couple weeks ago) version of 2.8.0-SNAPSHOT. I have an application that ran fine for a while and then I yarn kill-ed it. Now when I go to the only app attempt URL (like so: http://(snip RM host name):8088/cluster/appattempt/appattempt_1429683757595_0795_01) I see: AM Container: container_1429683757595_0795_01_01 Node: N/A and the container link goes to {noformat}http://(snip RM host name):8088/cluster/N/A {noformat} which obviously doesn't work -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3134: -- Attachment: hadoop-zshen-nodemanager-d-128-95-184-84.dhcp4.washington.edu.out I played with the latest patch locally, but I found ConcurrentModificationException issue is still not resolved. I attached my NM log. Please take a look. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch, YARN-3134DataSchema.pdf, hadoop-zshen-nodemanager-d-128-95-184-84.dhcp4.washington.edu.out Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533604#comment-14533604 ] Junping Du commented on YARN-3411: -- Thank you, [~vrushalic]! :) [Storage implementation] explore the native HBase write schema for storage -- Key: YARN-3411 URL: https://issues.apache.org/jira/browse/YARN-3411 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Priority: Critical Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, YARN-3411.poc.txt There is work that's in progress to implement the storage based on a Phoenix schema (YARN-3134). In parallel, we would like to explore an implementation based on a native HBase schema for the write path. Such a schema does not exclude using Phoenix, especially for reads and offline queries. Once we have basic implementations of both options, we could evaluate them in terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3480) Make AM max attempts stored in RMAppImpl and RMStateStore to be configurable
[ https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533731#comment-14533731 ] Jun Gong commented on YARN-3480: [~jianhe] just catch your option. Do you mean that the configure value for max attempts stored in RMAppImpl and RMStateStore is not needed at all, we could just remove the attempts records that's beyond the validity window? If so, I will update the patch. Make AM max attempts stored in RMAppImpl and RMStateStore to be configurable Key: YARN-3480 URL: https://issues.apache.org/jira/browse/YARN-3480 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jun Gong Assignee: Jun Gong Attachments: YARN-3480.01.patch, YARN-3480.02.patch, YARN-3480.03.patch When RM HA is enabled and running containers are kept across attempts, apps are more likely to finish successfully with more retries(attempts), so it will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make RM recover process much slower. It might be better to set max attempts to be stored in RMStateStore. BTW: When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to a small value, retried attempts might be very large. So we need to delete some attempts stored in RMStateStore and RMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533594#comment-14533594 ] Zhijie Shen edited comment on YARN-3134 at 5/7/15 11:47 PM: I played with the latest patch locally, but I found ConcurrentModificationException issue is still not resolved. I attached my NM log. Please take a look. My local environment: 1 HBase 0.98.11; 2. Phoenix 4.3.1 was (Author: zjshen): I played with the latest patch locally, but I found ConcurrentModificationException issue is still not resolved. I attached my NM log. Please take a look. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch, YARN-3134DataSchema.pdf, hadoop-zshen-nodemanager-d-128-95-184-84.dhcp4.washington.edu.out Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3584) [Log mesage correction] : MIssing space in Diagnostics message
[ https://issues.apache.org/jira/browse/YARN-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533635#comment-14533635 ] Hudson commented on YARN-3584: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #178 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/178/]) YARN-3584. Fixed attempt diagnostics format shown on the UI. Contributed by nijel (jianhe: rev b88700dcd0b9aa47662009241dfb83bc4446548d) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java [Log mesage correction] : MIssing space in Diagnostics message -- Key: YARN-3584 URL: https://issues.apache.org/jira/browse/YARN-3584 Project: Hadoop YARN Issue Type: Bug Reporter: nijel Assignee: nijel Priority: Trivial Labels: newbie Fix For: 2.8.0 Attachments: YARN-3584-1.patch, YARN-3584-2.patch For more detailed output, check application tracking page: https://szxciitslx17640:26001/cluster/app/application_1430810985970_0020{color:red}Then{color}, click on links to logs of each attempt. In this Then is not part of thr URL. Better to use a space in between so that the URL can be copied directly for analysis -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities
[ https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533637#comment-14533637 ] Hudson commented on YARN-3448: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #178 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/178/]) YARN-3448. Added a rolling time-to-live LevelDB timeline store implementation. Contributed by Jonathan Eagles. (zjshen: rev daf3e4ef8bf73cbe4a799d51b4765809cd81089f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TestRollingLevelDBTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/RollingLevelDB.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TestRollingLevelDB.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/TimelineDataManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/RollingLevelDBTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/util/LeveldbUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TestLeveldbTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelinePutResponse.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TimelineStoreTestUtils.java Add Rolling Time To Lives Level DB Plugin Capabilities -- Key: YARN-3448 URL: https://issues.apache.org/jira/browse/YARN-3448 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Jonathan Eagles Fix For: 2.8.0 Attachments: YARN-3448.1.patch, YARN-3448.10.patch, YARN-3448.12.patch, YARN-3448.13.patch, YARN-3448.14.patch, YARN-3448.15.patch, YARN-3448.16.patch, YARN-3448.17.patch, YARN-3448.2.patch, YARN-3448.3.patch, YARN-3448.4.patch, YARN-3448.5.patch, YARN-3448.7.patch, YARN-3448.8.patch, YARN-3448.9.patch For large applications, the majority of the time in LeveldbTimelineStore is spent deleting old entities record at a time. An exclusive write lock is held during the entire deletion phase which in practice can be hours. If we are to relax some of the consistency constraints, other performance enhancing techniques can be employed to maximize the throughput and minimize locking time. Split the 5 sections of the leveldb database (domain, owner, start time, entity, index) into 5 separate databases. This allows each database to maximize the read cache effectiveness based on the unique usage patterns of each database. With 5 separate databases each lookup is much faster. This can also help with I/O to have the entity and index databases on separate disks. Rolling DBs for entity and index DBs. 99.9% of the data are in these two sections 4:1 ration (index to entity) at least for tez. We replace DB record removal with file system removal if we create a rolling set of databases that age out and can be efficiently removed. To do this we must place a constraint to always place an entity's events into it's correct rolling db instance based on start time. This allows us to stitching the data back together while reading and artificial paging. Relax the synchronous writes constraints. If we are willing to accept losing some records that we not flushed in the operating system during a crash, we can use async writes that can be much faster. Prefer Sequential writes. sequential writes can be several times faster than random writes. Spend some small effort arranging the writes in such a way that will trend towards sequential write performance over random write performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533656#comment-14533656 ] Zhijie Shen commented on YARN-3134: --- Noticed pom.xml is using 4.3.0 Phoenix. I retried with this version, the problem still happened. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch, YARN-3134DataSchema.pdf, hadoop-zshen-nodemanager-d-128-95-184-84.dhcp4.washington.edu.out Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3599) Fix the javadoc of DelegationTokenSecretManager in hadoop-yarn
[ https://issues.apache.org/jira/browse/YARN-3599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533664#comment-14533664 ] Hadoop QA commented on YARN-3599: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 39s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:red}-1{color} | javadoc | 9m 40s | The applied patch generated 1 additional warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 49s | The applied patch generated 1 new checkstyle issues (total was 8, now 6). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 5s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 3m 8s | Tests passed in hadoop-yarn-server-applicationhistoryservice. | | {color:green}+1{color} | yarn tests | 52m 14s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 93m 3s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12731288/YARN-3599.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 4d9f9e5 | | javadoc | https://builds.apache.org/job/PreCommit-YARN-Build/7784/artifact/patchprocess/diffJavadocWarnings.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7784/artifact/patchprocess/diffcheckstylehadoop-yarn-server-applicationhistoryservice.txt | | hadoop-yarn-server-applicationhistoryservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/7784/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7784/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7784/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7784/console | This message was automatically generated. Fix the javadoc of DelegationTokenSecretManager in hadoop-yarn -- Key: YARN-3599 URL: https://issues.apache.org/jira/browse/YARN-3599 Project: Hadoop YARN Issue Type: Sub-task Components: documentation Reporter: Gabor Liptak Priority: Trivial Attachments: YARN-3599.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533670#comment-14533670 ] Li Lu commented on YARN-3134: - Sure, I'll make those clean up changes. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch, YARN-3134DataSchema.pdf, hadoop-zshen-nodemanager-d-128-95-184-84.dhcp4.washington.edu.out Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533671#comment-14533671 ] Li Lu commented on YARN-3134: - Looking into this. Seems like we need more fine tunes for the synchronizations. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch, YARN-3134DataSchema.pdf, hadoop-zshen-nodemanager-d-128-95-184-84.dhcp4.washington.edu.out Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2421) CapacityScheduler still allocates containers to an app in the FINISHING state
[ https://issues.apache.org/jira/browse/YARN-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-2421: --- Affects Version/s: (was: 2.4.1) CapacityScheduler still allocates containers to an app in the FINISHING state - Key: YARN-2421 URL: https://issues.apache.org/jira/browse/YARN-2421 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Thomas Graves Assignee: Chang Li Attachments: YARN-2421.4.patch, yarn2421.patch, yarn2421.patch, yarn2421.patch I saw an instance of a bad application master where it unregistered with the RM but then continued to call into allocate. The RMAppAttempt went to the FINISHING state, but the capacity scheduler kept allocating it containers. We should probably have the capacity scheduler check that the application isn't in one of the terminal states before giving it containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2918) Don't fail RM if queue's configured labels are not existed in cluster-node-labels
[ https://issues.apache.org/jira/browse/YARN-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533716#comment-14533716 ] Hudson commented on YARN-2918: -- SUCCESS: Integrated in Hadoop-trunk-Commit #7764 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7764/]) YARN-2918. RM should not fail on startup if queue's configured labels do not exist in cluster-node-labels. Contributed by Wangda Tan (jianhe: rev f489a4ec969f3727d03c8e85d51af1018fc0b2a1) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueParsing.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java Don't fail RM if queue's configured labels are not existed in cluster-node-labels - Key: YARN-2918 URL: https://issues.apache.org/jira/browse/YARN-2918 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Rohith Assignee: Wangda Tan Fix For: 2.8.0 Attachments: YARN-2918.1.patch, YARN-2918.2.patch, YARN-2918.3.patch Currently, if admin setup labels on queues {{queue-path.accessible-node-labels = ...}}. And the label is not added to RM, queue's initialization will fail and RM will fail too: {noformat} 2014-12-03 20:11:50,126 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager ... Caused by: java.io.IOException: NodeLabelManager doesn't include label = x, please check. at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkIfLabelInClusterNodeLabels(SchedulerUtils.java:287) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.init(AbstractCSQueue.java:109) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.init(LeafQueue.java:120) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:567) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:587) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:462) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:294) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:324) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) {noformat} This is not a good user experience, we should stop fail RM so that admin can configure queue/labels in following steps: - Configure queue (with label) - Start RM - Add labels to RM - Submit applications Now admin has to: - Configure queue (without label) - Start RM - Add labels to RM - Refresh queue's config (with label) - Submit applications -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2421) CapacityScheduler still allocates containers to an app in the FINISHING state
[ https://issues.apache.org/jira/browse/YARN-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533727#comment-14533727 ] Hadoop QA commented on YARN-2421: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 39s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 34s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 33s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 51s | The applied patch generated 2 new checkstyle issues (total was 30, now 31). | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 14s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 52m 19s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 88m 41s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12731296/YARN-2421.4.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 4d9f9e5 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7785/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7785/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7785/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7785/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7785/console | This message was automatically generated. CapacityScheduler still allocates containers to an app in the FINISHING state - Key: YARN-2421 URL: https://issues.apache.org/jira/browse/YARN-2421 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Thomas Graves Assignee: Chang Li Attachments: YARN-2421.4.patch, yarn2421.patch, yarn2421.patch, yarn2421.patch I saw an instance of a bad application master where it unregistered with the RM but then continued to call into allocate. The RMAppAttempt went to the FINISHING state, but the capacity scheduler kept allocating it containers. We should probably have the capacity scheduler check that the application isn't in one of the terminal states before giving it containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2918) Don't fail RM if queue's configured labels are not existed in cluster-node-labels
[ https://issues.apache.org/jira/browse/YARN-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2918: -- Labels: (was: BB2015-05-TBR) Don't fail RM if queue's configured labels are not existed in cluster-node-labels - Key: YARN-2918 URL: https://issues.apache.org/jira/browse/YARN-2918 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Rohith Assignee: Wangda Tan Fix For: 2.8.0 Attachments: YARN-2918.1.patch, YARN-2918.2.patch, YARN-2918.3.patch Currently, if admin setup labels on queues {{queue-path.accessible-node-labels = ...}}. And the label is not added to RM, queue's initialization will fail and RM will fail too: {noformat} 2014-12-03 20:11:50,126 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager ... Caused by: java.io.IOException: NodeLabelManager doesn't include label = x, please check. at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkIfLabelInClusterNodeLabels(SchedulerUtils.java:287) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.init(AbstractCSQueue.java:109) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.init(LeafQueue.java:120) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:567) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:587) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:462) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:294) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:324) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) {noformat} This is not a good user experience, we should stop fail RM so that admin can configure queue/labels in following steps: - Configure queue (with label) - Start RM - Add labels to RM - Submit applications Now admin has to: - Configure queue (without label) - Start RM - Add labels to RM - Refresh queue's config (with label) - Submit applications -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3597) Fix the javadoc of DelegationTokenSecretManager in hadoop-hdfs
[ https://issues.apache.org/jira/browse/YARN-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533710#comment-14533710 ] Hadoop QA commented on YARN-3597: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 20s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 47s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 0s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 40s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 37s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 7s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | native | 3m 28s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 165m 32s | Tests failed in hadoop-hdfs. | | | | 208m 29s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.tracing.TestTraceAdmin | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12731285/YARN-3597.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / b88700d | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-YARN-Build/7780/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7780/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7780/console | This message was automatically generated. Fix the javadoc of DelegationTokenSecretManager in hadoop-hdfs -- Key: YARN-3597 URL: https://issues.apache.org/jira/browse/YARN-3597 Project: Hadoop YARN Issue Type: Sub-task Components: documentation Reporter: Gabor Liptak Priority: Trivial Attachments: YARN-3597.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1427) yarn-env.cmd should have the analog comments that are in yarn-env.sh
[ https://issues.apache.org/jira/browse/YARN-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533711#comment-14533711 ] Hadoop QA commented on YARN-1427: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12731303/YARN-1427-trunk.2.patch | | Optional Tests | | | git revision | trunk / f489a4e | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7787/console | This message was automatically generated. yarn-env.cmd should have the analog comments that are in yarn-env.sh Key: YARN-1427 URL: https://issues.apache.org/jira/browse/YARN-1427 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Zhijie Shen Labels: BB2015-05-TBR, newbie, windows Attachments: YARN-1427-trunk.2.patch, YARN-1427.1.patch There're the paragraphs of about RM/NM env vars (probably AHS as well soon) in yarn-env.sh. Should the windows version script provide the similar comments? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3276) Refactor and fix null casting in some map cast for TimelineEntity (old and new)
[ https://issues.apache.org/jira/browse/YARN-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533742#comment-14533742 ] Hadoop QA commented on YARN-3276: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 48s | Pre-patch YARN-2928 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 37s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 54s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 1m 23s | The applied patch generated 13 release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 16s | The applied patch generated 3 new checkstyle issues (total was 76, now 79). | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 36s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 2m 56s | The patch appears to introduce 4 new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 0m 22s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 57s | Tests passed in hadoop-yarn-common. | | | | 42m 37s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-api | | | org.apache.hadoop.yarn.api.records.timelineservice.TimelineMetric$1.compare(Long, Long) negates the return value of Long.compareTo(Long) At TimelineMetric.java:value of Long.compareTo(Long) At TimelineMetric.java:[line 47] | | FindBugs | module:hadoop-yarn-common | | | Inconsistent synchronization of org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.builder; locked 92% of time Unsynchronized access at AllocateResponsePBImpl.java:92% of time Unsynchronized access at AllocateResponsePBImpl.java:[line 391] | | | Inconsistent synchronization of org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.proto; locked 94% of time Unsynchronized access at AllocateResponsePBImpl.java:94% of time Unsynchronized access at AllocateResponsePBImpl.java:[line 391] | | | Inconsistent synchronization of org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.viaProto; locked 94% of time Unsynchronized access at AllocateResponsePBImpl.java:94% of time Unsynchronized access at AllocateResponsePBImpl.java:[line 391] | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12731302/YARN-3276-YARN-2928.v3.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / d4a2362 | | Release Audit | https://builds.apache.org/job/PreCommit-YARN-Build/7786/artifact/patchprocess/patchReleaseAuditProblems.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7786/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7786/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/7786/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-api.html | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/7786/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/7786/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7786/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7786/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7786/console | This message was automatically generated. Refactor and fix null casting in some map cast for TimelineEntity (old and new) --- Key: YARN-3276 URL: https://issues.apache.org/jira/browse/YARN-3276 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Junping Du Assignee: Junping Du Attachments: YARN-3276-YARN-2928.v3.patch, YARN-3276-v2.patch, YARN-3276-v3.patch, YARN-3276.patch
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533600#comment-14533600 ] Junping Du commented on YARN-3134: -- bq. We've also got some code cleanup work to do, but I put them in a priority lower than getting the performance evaluation done for now. I disagree. Actually, it usually takes more time for reviewer to identify these code style issues than simply fix it. Please respect the effort unless you have different opinions against these comments. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch, YARN-3134DataSchema.pdf, hadoop-zshen-nodemanager-d-128-95-184-84.dhcp4.washington.edu.out Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3523) Cleanup ResourceManagerAdministrationProtocol interface audience
[ https://issues.apache.org/jira/browse/YARN-3523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533638#comment-14533638 ] Hudson commented on YARN-3523: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #178 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/178/]) YARN-3523. Cleanup ResourceManagerAdministrationProtocol interface audience. Contributed by Naganarasimha G R (junping_du: rev 8e991f4b1d7226fdcd75c5dc9fe6e5ce721679b9) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ResourceManagerAdministrationProtocol.java * hadoop-yarn-project/CHANGES.txt Cleanup ResourceManagerAdministrationProtocol interface audience Key: YARN-3523 URL: https://issues.apache.org/jira/browse/YARN-3523 Project: Hadoop YARN Issue Type: Bug Components: client, resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Labels: newbie Fix For: 2.8.0 Attachments: YARN-3523.20150422-1.patch, YARN-3523.20150504-1.patch, YARN-3523.20150505-1.patch I noticed ResourceManagerAdministrationProtocol has @Private audience for the class and @Public audience for methods. It doesn't make sense to me. We should make class audience and methods audience consistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533632#comment-14533632 ] Wangda Tan commented on YARN-1680: -- Had some offline discussion with [~jianhe] and [~cwelch]. Some takings from my side: - If we want to get accurate headroom for application which has blacklisted nodes, it looks unavoidable to get sum(app.blacklisted_nodes.avail) while calculation headroom for the app. This requires when a node doing heartbeat with changed available resource, all apps blacklisted the node need to be notified, when there're lots of application blacklisted large amount of nodes, performance regression could happen. - If we consider sum(app.blacklisted_nodes.total) instead of considering sum(app.blacklisted_nodes.avail), headroom for app could be under estimated, this could lead to app with blacklisted nodes always receive 0 headroom when a large cluster with highly resource utilization (like 99%). Some fallbacks strategies: # Only do accurate headroom calculation when there're not too much blacklisted nodes as well as apps with blacklisted nodes. # Tolerance under estimation of headroom Some alternatives: - MAPREDUCE-6302 is targeting to preempt reducer even if we reported inaccurate headroom for apps. I think the approach looks good to me. - Move headroom calculation to application side, I think now we cannot do it at least for now. Application will only receive updated NodeReport from when node changes heathy status instead of regular heartbeat. We cannot send so much data to AM during heartbeat. availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory. -- Key: YARN-1680 URL: https://issues.apache.org/jira/browse/YARN-1680 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.2.0, 2.3.0 Environment: SuSE 11 SP2 + Hadoop-2.3 Reporter: Rohith Assignee: Craig Welch Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, YARN-1680-v2.patch, YARN-1680.patch There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster slow start is set to 1. Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is become unstable(3 Map got killed), MRAppMaster blacklisted unstable NodeManager(NM-4). All reducer task are running in cluster now. MRAppMaster does not preempt the reducers because for Reducer preemption calculation, headRoom is considering blacklisted nodes memory. This makes jobs to hang forever(ResourceManager does not assing any new containers on blacklisted nodes but returns availableResouce considers cluster free memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533684#comment-14533684 ] Craig Welch commented on YARN-1680: --- bq. This requires when a node doing heartbeat with changed available resource, all apps blacklisted the node need to be notified Well, that's not quite so. From what we were talking about, it means that the blacklist deduction can't be a fixed amount but that it needs to be calculated by looking at the unused resource of the blacklisted nodes during headroom calculation. The rest of the above proposal for detecting changes, etc, works, but instead of a static deduction value we would need a reference to the blacklisted nodes for the app and look at their unused resources during the apps headroom calculation, so there is that cost, but it's not related to the heartbeat or a notification as such bq. headroom for app could be under estimated I think, generally, we should not take an approach which will underestimate/underutilize if we have 6302 to fall back on. If we don't, then we might want to do it only if we decide not to do the accurate calculation in some cases based on limits (see immediately below), but not as a matter of course. bq. Only do accurate headroom calculation when there're not too much blacklisted nodes as well as apps with blacklisted nodes. I think if we put a limit on it, it should be a purely local decision, to only do the calculation with x blacklisted nodes for an app, which we would expect to rarely be an issue. There is a potential for performance issues here, but we don't really know how great a concern it is. bq. MAPREDUCE-6302 is targeting to preempt reducer even if we reported inaccurate headroom for apps. I think the approach looks good to me I think that may work as a fallback option for MR, assuming it works out without issue, if we decide to not do the proper headroom calculation in some cases, but that's MR specific so it won't help non MR apps, and it has the issues I brought up before with performance degradation vs the proper headroom calculation. For these reasons I don't think it's a substitute for fixing this issue overall, it may be a fallback option if we limit the cases where we do the proper adjustment. bq. Move headroom calculation to application side, I think now we cannot do it at least for now...Application will only receive updated NodeReport from when node changes heathy status instead of regular heartbeat Well, in some sense that works OK for this because we really only need to know about those changes in node status status wrt the blacklist to detect recalculation changes with the approach proposed above. The problem is that we will also need a way to query for current usage per node while doing the calculation, I don't know if an efficient call for that exists (it would ideally be batch for N nodes where we would ask for all the blacklisted nodes at once.) There is also the broader issue that we don't seem to have a single entry point client-side for doing this right now, so we would need to touch a few points to add a library/something of that nature to do this, and for AM's we may not be aware of/that are not part of the core, they would have to potentially do some integration to get this. availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory. -- Key: YARN-1680 URL: https://issues.apache.org/jira/browse/YARN-1680 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.2.0, 2.3.0 Environment: SuSE 11 SP2 + Hadoop-2.3 Reporter: Rohith Assignee: Craig Welch Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, YARN-1680-v2.patch, YARN-1680.patch There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster slow start is set to 1. Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is become unstable(3 Map got killed), MRAppMaster blacklisted unstable NodeManager(NM-4). All reducer task are running in cluster now. MRAppMaster does not preempt the reducers because for Reducer preemption calculation, headRoom is considering blacklisted nodes memory. This makes jobs to hang forever(ResourceManager does not assing any new containers on blacklisted nodes but returns availableResouce considers cluster free memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533697#comment-14533697 ] Sangjin Lee commented on YARN-3044: --- Sorry Naga it's taking a while. I'm going to review it tonight and comment. [Event producers] Implement RM writing app lifecycle events to ATS -- Key: YARN-3044 URL: https://issues.apache.org/jira/browse/YARN-3044 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Labels: BB2015-05-TBR Attachments: YARN-3044-YARN-2928.004.patch, YARN-3044-YARN-2928.005.patch, YARN-3044-YARN-2928.006.patch, YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch, YARN-3044.20150416-1.patch Per design in YARN-2928, implement RM writing app lifecycle events to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3480) Make AM max attempts stored in RMAppImpl and RMStateStore to be configurable
[ https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533705#comment-14533705 ] Jun Gong commented on YARN-3480: [~jianhe], thanks for your comments and suggestion. The latest patch(YARN-3480.03.patch) have been working as expected: just remove the attempt records that's beyond the validity window. For the special case that the validity window is -1, it might be better to remove attempts until the number of attempts is less than hard limit, because yarn.resourcemanager.am.max-attempts might be very large(like our scenario) . What's you option? Make AM max attempts stored in RMAppImpl and RMStateStore to be configurable Key: YARN-3480 URL: https://issues.apache.org/jira/browse/YARN-3480 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jun Gong Assignee: Jun Gong Attachments: YARN-3480.01.patch, YARN-3480.02.patch, YARN-3480.03.patch When RM HA is enabled and running containers are kept across attempts, apps are more likely to finish successfully with more retries(attempts), so it will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make RM recover process much slower. It might be better to set max attempts to be stored in RMStateStore. BTW: When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to a small value, retried attempts might be very large. So we need to delete some attempts stored in RMStateStore and RMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2918) Don't fail RM if queue's configured labels are not existed in cluster-node-labels
[ https://issues.apache.org/jira/browse/YARN-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533329#comment-14533329 ] Hadoop QA commented on YARN-2918: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 35s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 29s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 32s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 47s | The applied patch generated 5 new checkstyle issues (total was 363, now 367). | | {color:red}-1{color} | whitespace | 0m 12s | The patch has 28 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 36s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 15s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 52m 56s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 89m 22s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12731243/YARN-2918.3.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / ab5058d | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7772/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7772/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7772/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7772/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7772/console | This message was automatically generated. Don't fail RM if queue's configured labels are not existed in cluster-node-labels - Key: YARN-2918 URL: https://issues.apache.org/jira/browse/YARN-2918 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Rohith Assignee: Wangda Tan Labels: BB2015-05-TBR Attachments: YARN-2918.1.patch, YARN-2918.2.patch, YARN-2918.3.patch Currently, if admin setup labels on queues {{queue-path.accessible-node-labels = ...}}. And the label is not added to RM, queue's initialization will fail and RM will fail too: {noformat} 2014-12-03 20:11:50,126 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager ... Caused by: java.io.IOException: NodeLabelManager doesn't include label = x, please check. at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkIfLabelInClusterNodeLabels(SchedulerUtils.java:287) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.init(AbstractCSQueue.java:109) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.init(LeafQueue.java:120) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:567) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:587) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:462) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:294) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:324) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) {noformat} This is not a good user experience, we should stop fail RM so that admin can configure queue/labels in following steps: - Configure queue (with label) - Start RM -
[jira] [Updated] (YARN-3358) Audit log not present while refreshing Service ACLs'
[ https://issues.apache.org/jira/browse/YARN-3358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3358: --- Attachment: YARN-3358.01.patch Audit log not present while refreshing Service ACLs' Key: YARN-3358 URL: https://issues.apache.org/jira/browse/YARN-3358 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Minor Attachments: YARN-3358.01.patch There should be a success audit log in AdminService#refreshServiceAcls -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533393#comment-14533393 ] Li Lu commented on YARN-3134: - I just opened YARN-3595 to trace all connection cache related discussions. I wrote a summary for the background of this problem. Please feel free to add more. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Labels: BB2015-05-TBR Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities
[ https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533397#comment-14533397 ] Hudson commented on YARN-3448: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #188 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/188/]) YARN-3448. Added a rolling time-to-live LevelDB timeline store implementation. Contributed by Jonathan Eagles. (zjshen: rev daf3e4ef8bf73cbe4a799d51b4765809cd81089f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TimelineStoreTestUtils.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TestRollingLevelDB.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TestLeveldbTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelinePutResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/util/LeveldbUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/RollingLevelDBTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/TimelineDataManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/RollingLevelDB.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TestRollingLevelDBTimelineStore.java Add Rolling Time To Lives Level DB Plugin Capabilities -- Key: YARN-3448 URL: https://issues.apache.org/jira/browse/YARN-3448 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Jonathan Eagles Fix For: 2.8.0 Attachments: YARN-3448.1.patch, YARN-3448.10.patch, YARN-3448.12.patch, YARN-3448.13.patch, YARN-3448.14.patch, YARN-3448.15.patch, YARN-3448.16.patch, YARN-3448.17.patch, YARN-3448.2.patch, YARN-3448.3.patch, YARN-3448.4.patch, YARN-3448.5.patch, YARN-3448.7.patch, YARN-3448.8.patch, YARN-3448.9.patch For large applications, the majority of the time in LeveldbTimelineStore is spent deleting old entities record at a time. An exclusive write lock is held during the entire deletion phase which in practice can be hours. If we are to relax some of the consistency constraints, other performance enhancing techniques can be employed to maximize the throughput and minimize locking time. Split the 5 sections of the leveldb database (domain, owner, start time, entity, index) into 5 separate databases. This allows each database to maximize the read cache effectiveness based on the unique usage patterns of each database. With 5 separate databases each lookup is much faster. This can also help with I/O to have the entity and index databases on separate disks. Rolling DBs for entity and index DBs. 99.9% of the data are in these two sections 4:1 ration (index to entity) at least for tez. We replace DB record removal with file system removal if we create a rolling set of databases that age out and can be efficiently removed. To do this we must place a constraint to always place an entity's events into it's correct rolling db instance based on start time. This allows us to stitching the data back together while reading and artificial paging. Relax the synchronous writes constraints. If we are willing to accept losing some records that we not flushed in the operating system during a crash, we can use async writes that can be much faster. Prefer Sequential writes. sequential writes can be several times faster than random writes. Spend some small effort arranging the writes in such a way that will trend towards sequential write performance over random write performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3584) [Log mesage correction] : MIssing space in Diagnostics message
[ https://issues.apache.org/jira/browse/YARN-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533394#comment-14533394 ] Hudson commented on YARN-3584: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #188 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/188/]) YARN-3584. Fixed attempt diagnostics format shown on the UI. Contributed by nijel (jianhe: rev b88700dcd0b9aa47662009241dfb83bc4446548d) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/CHANGES.txt [Log mesage correction] : MIssing space in Diagnostics message -- Key: YARN-3584 URL: https://issues.apache.org/jira/browse/YARN-3584 Project: Hadoop YARN Issue Type: Bug Reporter: nijel Assignee: nijel Priority: Trivial Labels: newbie Fix For: 2.8.0 Attachments: YARN-3584-1.patch, YARN-3584-2.patch For more detailed output, check application tracking page: https://szxciitslx17640:26001/cluster/app/application_1430810985970_0020{color:red}Then{color}, click on links to logs of each attempt. In this Then is not part of thr URL. Better to use a space in between so that the URL can be copied directly for analysis -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3523) Cleanup ResourceManagerAdministrationProtocol interface audience
[ https://issues.apache.org/jira/browse/YARN-3523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533398#comment-14533398 ] Hudson commented on YARN-3523: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #188 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/188/]) YARN-3523. Cleanup ResourceManagerAdministrationProtocol interface audience. Contributed by Naganarasimha G R (junping_du: rev 8e991f4b1d7226fdcd75c5dc9fe6e5ce721679b9) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ResourceManagerAdministrationProtocol.java * hadoop-yarn-project/CHANGES.txt Cleanup ResourceManagerAdministrationProtocol interface audience Key: YARN-3523 URL: https://issues.apache.org/jira/browse/YARN-3523 Project: Hadoop YARN Issue Type: Bug Components: client, resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Labels: newbie Fix For: 2.8.0 Attachments: YARN-3523.20150422-1.patch, YARN-3523.20150504-1.patch, YARN-3523.20150505-1.patch I noticed ResourceManagerAdministrationProtocol has @Private audience for the class and @Public audience for methods. It doesn't make sense to me. We should make class audience and methods audience consistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3134: Labels: (was: BB2015-05-TBR) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3134: Attachment: (was: YARN-3134-YARN-2928.006.patch) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, YARN-3134-YARN-2928.005.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-644) Basic null check is not performed on passed in arguments before using them in ContainerManagerImpl.startContainer
[ https://issues.apache.org/jira/browse/YARN-644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533448#comment-14533448 ] Li Lu commented on YARN-644: Hi [~varun_saxena], thanks for the patch! Asserting on the content of the exception message may unnecessarily couple the exception handling message with the test, which makes future changes harder. Maybe we'd like to provide some central place for those exception message constants? Thanks! Basic null check is not performed on passed in arguments before using them in ContainerManagerImpl.startContainer - Key: YARN-644 URL: https://issues.apache.org/jira/browse/YARN-644 Project: Hadoop YARN Issue Type: Sub-task Reporter: Omkar Vinit Joshi Assignee: Varun Saxena Priority: Minor Labels: BB2015-05-TBR, newbie Attachments: YARN-644.001.patch, YARN-644.002.patch, YARN-644.03.patch I see that validation/ null check is not performed on passed in parameters. Ex. tokenId.getContainerID().getApplicationAttemptId() inside ContainerManagerImpl.authorizeRequest() I guess we should add these checks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533481#comment-14533481 ] Hadoop QA commented on YARN-3134: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 51s | Pre-patch YARN-2928 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 37s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 34s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 25m 54s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12731281/YARN-3134-YARN-2928.006.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / d4a2362 | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/7782/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7782/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7782/console | This message was automatically generated. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3600) AM container link is broken (on a killed application, at least)
[ https://issues.apache.org/jira/browse/YARN-3600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin moved HIVE-10654 to YARN-3600: --- Component/s: (was: Web UI) Target Version/s: 2.8.0 Key: YARN-3600 (was: HIVE-10654) Project: Hadoop YARN (was: Hive) AM container link is broken (on a killed application, at least) --- Key: YARN-3600 URL: https://issues.apache.org/jira/browse/YARN-3600 Project: Hadoop YARN Issue Type: Bug Reporter: Sergey Shelukhin Running some fairly recent (couple weeks ago) version of 2.8.0-SNAPSHOT. I have an application that ran fine for a while and then I yarn kill-ed it. Now when I go to the only app attempt URL (like so: http://(snip):8088/cluster/appattempt/appattempt_1429683757595_0795_01) I see: AM Container: container_1429683757595_0795_01_01 Node: N/A and the container URL is {noformat}http://cn042-10.l42scl.hortonworks.com:8088/cluster/N/A {noformat} which obviously doesn't work -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3385) Race condition: KeeperException$NoNodeException will cause RM shutdown during ZK node deletion.
[ https://issues.apache.org/jira/browse/YARN-3385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532429#comment-14532429 ] Hudson commented on YARN-3385: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #187 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/187/]) YARN-3385. Fixed a race-condition in ResourceManager's ZooKeeper based state-store to avoid crashing on duplicate deletes. Contributed by Zhihai Xu. (vinodkv: rev 4c7b9b6abe2452c9752a11214762be2e7665fb32) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * hadoop-yarn-project/CHANGES.txt Race condition: KeeperException$NoNodeException will cause RM shutdown during ZK node deletion. --- Key: YARN-3385 URL: https://issues.apache.org/jira/browse/YARN-3385 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Labels: BB2015-05-TBR Fix For: 2.7.1 Attachments: YARN-3385.000.patch, YARN-3385.001.patch, YARN-3385.002.patch, YARN-3385.003.patch, YARN-3385.004.patch Race condition: KeeperException$NoNodeException will cause RM shutdown during ZK node deletion(Op.delete). The race condition is similar as YARN-3023. since the race condition exists for ZK node creation, it should also exist for ZK node deletion. We see this issue with the following stack trace: {code} 2015-03-17 19:18:58,958 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:945) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:857) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:854) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:854) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.removeApplicationStateInternal(ZKRMStateStore.java:647) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:691) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2015-03-17 19:18:58,959 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3577) Misspelling of threshold in log4j.properties for tests
[ https://issues.apache.org/jira/browse/YARN-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532424#comment-14532424 ] Hudson commented on YARN-3577: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #187 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/187/]) YARN-3577. Misspelling of threshold in log4j.properties for tests. Contributed by Brahma Reddy Battula. (aajisaka: rev 918af8efff805d8f204ecfa6fe12c290a7a8e509) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/test/resources/log4j.properties * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/resources/log4j.properties * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/resources/log4j.properties * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/log4j.properties * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/test/resources/log4j.properties * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/log4j.properties * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/resources/log4j.properties * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/resources/log4j.properties Misspelling of threshold in log4j.properties for tests -- Key: YARN-3577 URL: https://issues.apache.org/jira/browse/YARN-3577 Project: Hadoop YARN Issue Type: Bug Components: test Affects Versions: 2.7.0 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Minor Fix For: 2.8.0 Attachments: YARN-3577.patch log4j.properties file for test contains misspelling log4j.threshhold. We should use log4j.threshold correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3491) PublicLocalizer#addResource is too slow.
[ https://issues.apache.org/jira/browse/YARN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532420#comment-14532420 ] Hudson commented on YARN-3491: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #187 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/187/]) YARN-3491. PublicLocalizer#addResource is too slow. (zxu via rkanter) (rkanter: rev b72507810aece08e17ab4b5aae1f7eae1fe98609) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java PublicLocalizer#addResource is too slow. Key: YARN-3491 URL: https://issues.apache.org/jira/browse/YARN-3491 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.7.0 Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Fix For: 2.8.0 Attachments: YARN-3491.000.patch, YARN-3491.001.patch, YARN-3491.002.patch, YARN-3491.003.patch, YARN-3491.004.patch Based on the profiling, The bottleneck in PublicLocalizer#addResource is getInitializedLocalDirs. getInitializedLocalDirs call checkLocalDir. checkLocalDir is very slow which takes about 10+ ms. The total delay will be approximately number of local dirs * 10+ ms. This delay will be added for each public resource localization. Because PublicLocalizer#addResource is slow, the thread pool can't be fully utilized. Instead of doing public resource localization in parallel(multithreading), public resource localization is serialized most of the time. And also PublicLocalizer#addResource is running in Dispatcher thread, So the Dispatcher thread will be blocked by PublicLocalizer#addResource for long time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.
[ https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532418#comment-14532418 ] Hudson commented on YARN-3243: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #187 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/187/]) YARN-3243. Moving CHANGES.txt entry to the right release. (vinodkv: rev 185e63a72638f01bb27e790c8dedf458923a2cae) * hadoop-yarn-project/CHANGES.txt CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits. - Key: YARN-3243 URL: https://issues.apache.org/jira/browse/YARN-3243 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.8.0, 2.7.1 Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, YARN-3243.4.patch, YARN-3243.5.patch Now CapacityScheduler has some issues to make sure ParentQueue always obeys its capacity limits, for example: 1) When allocating container of a parent queue, it will only check parentQueue.usage parentQueue.max. If leaf queue allocated a container.size (parentQueue.max - parentQueue.usage), parent queue can excess its max resource limit, as following example: {code} A (usage=54, max=55) / \ A1 A2 (usage=1, max=55) (usage=53, max=53) {code} Queue-A2 is able to allocate container since its usage max, but if we do that, A's usage can excess A.max. 2) When doing continous reservation check, parent queue will only tell children you need unreserve *some* resource, so that I will less than my maximum resource, but it will not tell how many resource need to be unreserved. This may lead to parent queue excesses configured maximum capacity as well. With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, *here is my proposal*: - ParentQueue will set its children's ResourceUsage.headroom, which means, *maximum resource its children can allocate*. - ParentQueue will set its children's headroom to be (saying parent's name is qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's ancestors' capacity will be enforced as well (qA.headroom is set by qA's parent). - {{needToUnReserve}} is not necessary, instead, children can get how much resource need to be unreserved to keep its parent's resource limit. - More over, with this, YARN-3026 will make a clear boundary between LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)