[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk
[ https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173427#comment-15173427 ] Chris Douglas commented on YARN-4734: - bq. For merge it at the top level, did you mean LICENSE.txt and BUILDING.txt? Are there any other files I need to change? {{NOTICE.txt}} may also need to be updated. No worries on the WIP, we can do a pass on the docs when it's ready to merge. > Merge branch:YARN-3368 to trunk > --- > > Key: YARN-4734 > URL: https://issues.apache.org/jira/browse/YARN-4734 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4734.1.patch, YARN-4734.2.patch, YARN-4734.3.patch > > > YARN-2928 branch is planned to merge back to trunk shortly, it depends on > changes of YARN-3368. This JIRA is to track the merging task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4700) ATS storage has one extra record each time the RM got restarted
[ https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173510#comment-15173510 ] Varun Saxena commented on YARN-4700: Thanks [~Naganarasimha] for the patch. I have nothing further to add other than what Vrushali said about changes in main code. I have same comments. Looked at the test failures. For TestHBaseStorageFlowActivity, FlowActivityRowKey constructor is used while parsing row key so I don't think we should be changing that. I think we can just change the timestamps of the app events and as Vrushali suggested, keep all the timestamps within one day. So that we can test that different apps on a single day generate one flow for that day. Currently 4 flow activity entries are coming due to app event timestamps generating 4 different top of the day timestamps. For the other test case failure i.e. in TestTimelineReaderWebServicesHBaseStorage, you will have to change daterange queries because those REST queries are based on current timestamp. > ATS storage has one extra record each time the RM got restarted > --- > > Key: YARN-4700 > URL: https://issues.apache.org/jira/browse/YARN-4700 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Li Lu >Assignee: Naganarasimha G R > Labels: yarn-2928-1st-milestone > Attachments: YARN-4700-YARN-2928.v1.001.patch, > YARN-4700-YARN-2928.wip.patch > > > When testing the new web UI for ATS v2, I noticed that we're creating one > extra record for each finished application (but still hold in the RM state > store) each time the RM got restarted. It's quite possible that we add the > cluster start timestamp into the default cluster id, thus each time we're > creating a new record for one application (cluster id is a part of the row > key). We need to fix this behavior, probably by having a better default > cluster id. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent
[ https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-4002: Attachment: 0001-YARN-4002.patch Updated patch myself with small correction. > make ResourceTrackerService.nodeHeartbeat more concurrent > - > > Key: YARN-4002 > URL: https://issues.apache.org/jira/browse/YARN-4002 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Critical > Attachments: 0001-YARN-4002.patch, YARN-4002-lockless-read.patch, > YARN-4002-rwlock.patch, YARN-4002-v0.patch > > > We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By > design the method ResourceTrackerService.nodeHeartbeat should be concurrent > enough to scale for large clusters. > But we have a "BIG" lock in NodesListManager.isValidNode which I think it's > unnecessary. > First, the fields "includes" and "excludes" of HostsFileReader are only > updated on "refresh nodes". All RPC threads handling node heartbeats are > only readers. So RWLock could be used to alow concurrent access by RPC > threads. > Second, since he fields "includes" and "excludes" of HostsFileReader are > always updated by "reference assignment", which is atomic in Java, the reader > side lock could just be skipped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent
[ https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173557#comment-15173557 ] Rohith Sharma K S commented on YARN-4002: - Recently we hit this issue in 2K nodes testing. It is good to go in for branch-2.8. nit on the patch : need not to have read lock on method printConfiguredHosts since it is called from refreshHostsReader which is write locked. > make ResourceTrackerService.nodeHeartbeat more concurrent > - > > Key: YARN-4002 > URL: https://issues.apache.org/jira/browse/YARN-4002 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Critical > Attachments: 0001-YARN-4002.patch, YARN-4002-lockless-read.patch, > YARN-4002-rwlock.patch, YARN-4002-v0.patch > > > We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By > design the method ResourceTrackerService.nodeHeartbeat should be concurrent > enough to scale for large clusters. > But we have a "BIG" lock in NodesListManager.isValidNode which I think it's > unnecessary. > First, the fields "includes" and "excludes" of HostsFileReader are only > updated on "refresh nodes". All RPC threads handling node heartbeats are > only readers. So RWLock could be used to alow concurrent access by RPC > threads. > Second, since he fields "includes" and "excludes" of HostsFileReader are > always updated by "reference assignment", which is atomic in Java, the reader > side lock could just be skipped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4746) yarn web services should convert parse failures of appId to 400
[ https://issues.apache.org/jira/browse/YARN-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173560#comment-15173560 ] Steve Loughran commented on YARN-4746: -- Having played with this some more, I think it's probably wise to review all the uses of the conversion logic in the codebase; bits of it appear to assume that the return value is {{null}} if there's no match, rather than anything else. regarding the patch, -1 I'm afraid. Loses the stack. Look at what I've done in YARN-4696 here > yarn web services should convert parse failures of appId to 400 > --- > > Key: YARN-4746 > URL: https://issues.apache.org/jira/browse/YARN-4746 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Priority: Minor > Attachments: 0001-YARN-4746.patch > > > I'm seeing somewhere in the WS API tests of mine an error with exception > conversion of a bad app ID sent in as an argument to a GET. I know it's in > ATS, but a scan of the core RM web services implies a same problem > {{WebServices.parseApplicationId()}} uses {{ConverterUtils.toApplicationId}} > to convert an argument; this throws IllegalArgumentException, which is then > handled somewhere by jetty as a 500 error. > In fact, it's a bad argument, which should be handled by returning a 400. > This can be done by catching the raised argument and explicitly converting it -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173716#comment-15173716 ] wanglei-it commented on YARN-1506: -- Hi Junping Du, thanks for your work. I tested this feature in HA. When switching from RM1 to RM2, all the update information would be lost. RM2 will recovery the NM's resource configuration as it registered. Right? > Replace set resource change on RMNode/SchedulerNode directly with event > notification. > - > > Key: YARN-1506 > URL: https://issues.apache.org/jira/browse/YARN-1506 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-1506-v1.patch, YARN-1506-v10.patch, > YARN-1506-v11.patch, YARN-1506-v12.patch, YARN-1506-v13.patch, > YARN-1506-v14.patch, YARN-1506-v15.patch, YARN-1506-v16.patch, > YARN-1506-v17.patch, YARN-1506-v2.patch, YARN-1506-v3.patch, > YARN-1506-v4.patch, YARN-1506-v5.patch, YARN-1506-v6.patch, > YARN-1506-v7.patch, YARN-1506-v8.patch, YARN-1506-v9.patch > > > According to Vinod's comments on YARN-312 > (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087), > we should replace RMNode.setResourceOption() with some resource change event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent
[ https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173728#comment-15173728 ] Hadoop QA commented on YARN-4002: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 21s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 71m 49s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 158m 14s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_72 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler | | JDK v1.7.0_95 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL |
[jira] [Updated] (YARN-4696) EntityGroupFSTimelineStore to work in the absence of an RM
[ https://issues.apache.org/jira/browse/YARN-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-4696: - Attachment: YARN-4696-010.patch YARN-4696 patch 010. Checkstyle warnings. The FileSystemTimelineWriter use FileSystem.newInstance() to create a new FS instance, with the chosen retry policies. > EntityGroupFSTimelineStore to work in the absence of an RM > -- > > Key: YARN-4696 > URL: https://issues.apache.org/jira/browse/YARN-4696 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-4696-001.patch, YARN-4696-002.patch, > YARN-4696-003.patch, YARN-4696-005.patch, YARN-4696-006.patch, > YARN-4696-007.patch, YARN-4696-008.patch, YARN-4696-009.patch, > YARN-4696-010.patch > > > {{EntityGroupFSTimelineStore}} now depends on an RM being up and running; the > configuration pointing to it. This is a new change, and impacts testing where > you have historically been able to test without an RM running. > The sole purpose of the probe is to automatically determine if an app is > running; it falls back to "unknown" if not. If the RM connection was > optional, the "unknown" codepath could be called directly, relying on age of > file as a metric of completion > Options > # add a flag to disable RM connect > # skip automatically if RM not defined/set to 0.0.0.0 > # disable retries on yarn client IPC; if it fails, tag app as unknown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4696) EntityGroupFSTimelineStore to work in the absence of an RM
[ https://issues.apache.org/jira/browse/YARN-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173824#comment-15173824 ] Hadoop QA commented on YARN-4696: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 42s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 4s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 40s {color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 26s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage in trunk has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 59s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 41s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 41s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 3s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 3s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 30s {color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 1 new + 29 unchanged - 0 fixed = 30 total (was 29) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 8s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 21s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 51s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 37s {color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the patch passed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 50s {color} | {color:green} hadoop-yarn-server-timeline-pluginstorage in the patch passed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 9s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK
[jira] [Commented] (YARN-4750) App metrics may not be correct when an app is recovered
[ https://issues.apache.org/jira/browse/YARN-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173917#comment-15173917 ] Jian He commented on YARN-4750: --- this was intentional, as the thought was persisting the metrics periodically while the app is running will cause too much load on state-store. > App metrics may not be correct when an app is recovered > --- > > Key: YARN-4750 > URL: https://issues.apache.org/jira/browse/YARN-4750 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > > App metrics(rather app attempt metrics) like Vcore-seconds and MB-seconds are > saved in the state store when there is an attempt state transition. Values > for running attempts will be in memory and will not be saved when there is an > RM restart/failover. For recovered app metrics value will be reset. In that > case, these values will be incomplete. > Was this intentional or have we not found a correct way to fix it? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173941#comment-15173941 ] Junping Du commented on YARN-1506: -- Hi Wanglei, Yes. the understanding here is correct. We are not persistent updated resource configuration so far but other JIRA (like YARN-1000) will track the effort. Thanks for your comments. > Replace set resource change on RMNode/SchedulerNode directly with event > notification. > - > > Key: YARN-1506 > URL: https://issues.apache.org/jira/browse/YARN-1506 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-1506-v1.patch, YARN-1506-v10.patch, > YARN-1506-v11.patch, YARN-1506-v12.patch, YARN-1506-v13.patch, > YARN-1506-v14.patch, YARN-1506-v15.patch, YARN-1506-v16.patch, > YARN-1506-v17.patch, YARN-1506-v2.patch, YARN-1506-v3.patch, > YARN-1506-v4.patch, YARN-1506-v5.patch, YARN-1506-v6.patch, > YARN-1506-v7.patch, YARN-1506-v8.patch, YARN-1506-v9.patch > > > According to Vinod's comments on YARN-312 > (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087), > we should replace RMNode.setResourceOption() with some resource change event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4740) container complete msg may lost while AM restart in race condition
[ https://issues.apache.org/jira/browse/YARN-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174027#comment-15174027 ] Jian He commented on YARN-4740: --- thanks [~sandflee] ! - could you add a check that the completed container is indeed in the returned allocate response ? also check that the completed container is not in the RMAppAttemptImpl#justFinishedContainers. {code} // sleep a while make sure allocate() get complete container, // before this msg pass to AM, AM may crash Thread.sleep(1000); am1.allocate( new ArrayList(), new ArrayList()); {code} - it's unnecessary to parameterize the test based on whether RMWorkPreservingEnabled or not, because the test is not doing any RM restart at all. {code} testAMRestartNotLostContainerCompleteMsg(true); testAMRestartNotLostContainerCompleteMsg(false); {code} > container complete msg may lost while AM restart in race condition > -- > > Key: YARN-4740 > URL: https://issues.apache.org/jira/browse/YARN-4740 > Project: Hadoop YARN > Issue Type: Bug >Reporter: sandflee >Assignee: sandflee > Attachments: YARN-4740.01.patch > > > 1, container completed, and the msg is store in > RMAppAttempt.justFinishedContainers > 2, AM allocate and before allocateResponse came to AM, AM crashed > 3, AM restart and couldn't get the container complete msg. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-4740) container complete msg may lost while AM restart in race condition
[ https://issues.apache.org/jira/browse/YARN-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174027#comment-15174027 ] Jian He edited comment on YARN-4740 at 3/1/16 5:05 PM: --- thanks [~sandflee] ! - could you add a check that the completed container is indeed in the returned allocate response ? also check that the completed container is not in the RMAppAttemptImpl#justFinishedContainers. {code} // sleep a while make sure allocate() get complete container, // before this msg pass to AM, AM may crash Thread.sleep(1000); am1.allocate( new ArrayList(), new ArrayList()); {code} - it's unnecessary to parameterize the test based on whether RMWorkPreservingEnabled or not, because the test is not doing any RM restart at all. {code} testAMRestartNotLostContainerCompleteMsg(true); testAMRestartNotLostContainerCompleteMsg(false); {code} - please also add a comment about why doing so in transferStateFromAttempt was (Author: jianhe): thanks [~sandflee] ! - could you add a check that the completed container is indeed in the returned allocate response ? also check that the completed container is not in the RMAppAttemptImpl#justFinishedContainers. {code} // sleep a while make sure allocate() get complete container, // before this msg pass to AM, AM may crash Thread.sleep(1000); am1.allocate( new ArrayList(), new ArrayList()); {code} - it's unnecessary to parameterize the test based on whether RMWorkPreservingEnabled or not, because the test is not doing any RM restart at all. {code} testAMRestartNotLostContainerCompleteMsg(true); testAMRestartNotLostContainerCompleteMsg(false); {code} > container complete msg may lost while AM restart in race condition > -- > > Key: YARN-4740 > URL: https://issues.apache.org/jira/browse/YARN-4740 > Project: Hadoop YARN > Issue Type: Bug >Reporter: sandflee >Assignee: sandflee > Attachments: YARN-4740.01.patch > > > 1, container completed, and the msg is store in > RMAppAttempt.justFinishedContainers > 2, AM allocate and before allocateResponse came to AM, AM crashed > 3, AM restart and couldn't get the container complete msg. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4634) Scheduler UI/Metrics need to consider cases like non-queue label mappings
[ https://issues.apache.org/jira/browse/YARN-4634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174082#comment-15174082 ] Sunil G commented on YARN-4634: --- Thanks [~leftnoteasy] for the comments. I agree that we are considering too many variables to decide upon whether to render with labels or not in earlier patch. Yes, it adds the complexity. Meantime, current system has some corner cases where label-queue mappings are not present too. So i think such cases can be handled by assuming that we render UI with labels. So I will try to consolidate the idea here, if cluster has labels other than DEFAULT_LABEL and at least one such label has >0 active NMs (other than DEFAULT_LABEL), then we will render UI with labels. Is this fine? > Scheduler UI/Metrics need to consider cases like non-queue label mappings > - > > Key: YARN-4634 > URL: https://issues.apache.org/jira/browse/YARN-4634 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-4634.patch, 0002-YARN-4634.patch > > > Currently when label-queue mappings are not available, there are few > assumptions taken in UI and in metrics. > In above case where labels are enabled and available in cluster but without > any queue mappings, UI displays queues under labels. This is not correct. > Currently labels enabled check and availability of labels are considered to > render scheduler UI. Henceforth we also need to check whether > - queue-mappings are available > - nodes are mapped with labels with proper exclusivity flags on > This ticket also will try to see the default configurations in queue when > labels are not mapped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4751) In 2.7, Labeled queue usage not shown properly in capacity scheduler UI
Eric Payne created YARN-4751: Summary: In 2.7, Labeled queue usage not shown properly in capacity scheduler UI Key: YARN-4751 URL: https://issues.apache.org/jira/browse/YARN-4751 Project: Hadoop YARN Issue Type: Bug Components: capacity scheduler, yarn Affects Versions: 2.7.3 Reporter: Eric Payne Assignee: Eric Payne In 2.6 and 2.7, the capacity scheduler UI does not have the queue graphs separated by partition. When applications are running on a labeled queue, no color is shown in the bar graph, and several of the "Used" metrics are zero. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4751) In 2.7, Labeled queue usage not shown properly in capacity scheduler UI
[ https://issues.apache.org/jira/browse/YARN-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-4751: - Attachment: 2.7 CS UI No BarGraph.jpg In the attached screenshot, please note that the {{Used Capacity}}, {{Absolute Used Capacity}}, and {{Active User Info::Used Resources}} values are all zero even though {{Num Containers}} is 11. The application runs and completes successfully. > In 2.7, Labeled queue usage not shown properly in capacity scheduler UI > --- > > Key: YARN-4751 > URL: https://issues.apache.org/jira/browse/YARN-4751 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, yarn >Affects Versions: 2.7.3 >Reporter: Eric Payne >Assignee: Eric Payne > Attachments: 2.7 CS UI No BarGraph.jpg > > > In 2.6 and 2.7, the capacity scheduler UI does not have the queue graphs > separated by partition. When applications are running on a labeled queue, no > color is shown in the bar graph, and several of the "Used" metrics are zero. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4737) Use CSRF Filter in YARN
[ https://issues.apache.org/jira/browse/YARN-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Maron updated YARN-4737: - Attachment: (was: YARN-4737.patch.001) > Use CSRF Filter in YARN > --- > > Key: YARN-4737 > URL: https://issues.apache.org/jira/browse/YARN-4737 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, resourcemanager, webapp >Reporter: Jonathan Maron >Assignee: Jonathan Maron > Attachments: YARN-4737.001.patch > > > A CSRF filter was added to hadoop common > (https://issues.apache.org/jira/browse/HADOOP-12691). The aim of this JIRA > is to come up with a mechanism to integrate this filter into the webapps for > which it is applicable (web apps that may establish an authenticated > identity). That includes the RM, NM, and mapreduce jobhistory web app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4737) Use CSRF Filter in YARN
[ https://issues.apache.org/jira/browse/YARN-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Maron updated YARN-4737: - Attachment: YARN-4737.001.patch > Use CSRF Filter in YARN > --- > > Key: YARN-4737 > URL: https://issues.apache.org/jira/browse/YARN-4737 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, resourcemanager, webapp >Reporter: Jonathan Maron >Assignee: Jonathan Maron > Attachments: YARN-4737.001.patch > > > A CSRF filter was added to hadoop common > (https://issues.apache.org/jira/browse/HADOOP-12691). The aim of this JIRA > is to come up with a mechanism to integrate this filter into the webapps for > which it is applicable (web apps that may establish an authenticated > identity). That includes the RM, NM, and mapreduce jobhistory web app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4751) In 2.7, Labeled queue usage not shown properly in capacity scheduler UI
[ https://issues.apache.org/jira/browse/YARN-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174114#comment-15174114 ] Sunil G commented on YARN-4751: --- [~eepayne], Thanks for updating this. YARN-4304 handled issue in few metrics. In Trunk, I could see this metric is coming correctly. So I think one UI ticket is not picked in 2.7. > In 2.7, Labeled queue usage not shown properly in capacity scheduler UI > --- > > Key: YARN-4751 > URL: https://issues.apache.org/jira/browse/YARN-4751 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, yarn >Affects Versions: 2.7.3 >Reporter: Eric Payne >Assignee: Eric Payne > Attachments: 2.7 CS UI No BarGraph.jpg > > > In 2.6 and 2.7, the capacity scheduler UI does not have the queue graphs > separated by partition. When applications are running on a labeled queue, no > color is shown in the bar graph, and several of the "Used" metrics are zero. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4634) Scheduler UI/Metrics need to consider cases like non-queue label mappings
[ https://issues.apache.org/jira/browse/YARN-4634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174165#comment-15174165 ] Wangda Tan commented on YARN-4634: -- [~sunilg], sounds good. > Scheduler UI/Metrics need to consider cases like non-queue label mappings > - > > Key: YARN-4634 > URL: https://issues.apache.org/jira/browse/YARN-4634 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-4634.patch, 0002-YARN-4634.patch > > > Currently when label-queue mappings are not available, there are few > assumptions taken in UI and in metrics. > In above case where labels are enabled and available in cluster but without > any queue mappings, UI displays queues under labels. This is not correct. > Currently labels enabled check and availability of labels are considered to > render scheduler UI. Henceforth we also need to check whether > - queue-mappings are available > - nodes are mapped with labels with proper exclusivity flags on > This ticket also will try to see the default configurations in queue when > labels are not mapped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4634) Scheduler UI/Metrics need to consider cases like non-queue label mappings
[ https://issues.apache.org/jira/browse/YARN-4634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-4634: -- Attachment: 0003-YARN-4634.patch Updating patch as per the above mentioned comments. > Scheduler UI/Metrics need to consider cases like non-queue label mappings > - > > Key: YARN-4634 > URL: https://issues.apache.org/jira/browse/YARN-4634 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-4634.patch, 0002-YARN-4634.patch, > 0003-YARN-4634.patch > > > Currently when label-queue mappings are not available, there are few > assumptions taken in UI and in metrics. > In above case where labels are enabled and available in cluster but without > any queue mappings, UI displays queues under labels. This is not correct. > Currently labels enabled check and availability of labels are considered to > render scheduler UI. Henceforth we also need to check whether > - queue-mappings are available > - nodes are mapped with labels with proper exclusivity flags on > This ticket also will try to see the default configurations in queue when > labels are not mapped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4751) In 2.7, Labeled queue usage not shown properly in capacity scheduler UI
[ https://issues.apache.org/jira/browse/YARN-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174314#comment-15174314 ] Eric Payne commented on YARN-4751: -- Thanks, [~sunilg] for pointing out YARN-4304. I see that this revision has several JIRAs that would also need to be pulled back if YARN-4304 is cherry picked to 2.7, including YARN-1651 YARN-2003 YARN-3362 YARN-3463 YARN-3961 YARN-4082 YARN-4162. Is that correct? I think it would be better if we had a 2.7-specific patch for YARN-4304. Is that something you would be willing to provide? > In 2.7, Labeled queue usage not shown properly in capacity scheduler UI > --- > > Key: YARN-4751 > URL: https://issues.apache.org/jira/browse/YARN-4751 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, yarn >Affects Versions: 2.7.3 >Reporter: Eric Payne >Assignee: Eric Payne > Attachments: 2.7 CS UI No BarGraph.jpg > > > In 2.6 and 2.7, the capacity scheduler UI does not have the queue graphs > separated by partition. When applications are running on a labeled queue, no > color is shown in the bar graph, and several of the "Used" metrics are zero. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4681) ProcfsBasedProcessTree should not calculate private clean pages
[ https://issues.apache.org/jira/browse/YARN-4681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated YARN-4681: Assignee: Jan Lukavsky > ProcfsBasedProcessTree should not calculate private clean pages > --- > > Key: YARN-4681 > URL: https://issues.apache.org/jira/browse/YARN-4681 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.6.0, 2.7.0 >Reporter: Jan Lukavsky >Assignee: Jan Lukavsky > Attachments: YARN-4681.patch, YARN-4681.patch > > > ProcfsBasedProcessTree in Node manager calculates memory used by a process > tree by parsing {{/etc//smaps}}, where it calculates {{min(Pss, > Shared_Dirty) + Private_Dirty + Private_Clean}}. Because not {{mlocked}} > private clean pages can be reclaimed by kernel, this should be changed to > calculating only {{Locked}} pages instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4681) ProcfsBasedProcessTree should not calculate private clean pages
[ https://issues.apache.org/jira/browse/YARN-4681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174361#comment-15174361 ] Chris Nauroth commented on YARN-4681: - [~je.ik], thank you for updating the patch. I'm +1 for this change, pending a pre-commit test run from Jenkins. I just clicked the Submit Patch button, so Jenkins should pick it up now. However, I'm not confident enough to commit it immediately. I'd like to see reviews from committers who spend more time in YARN than me. I'd also like to find out if anyone thinks it should be configurable whether it checks locked or performs the old calculation. I don't have a sense for how widely people are dependent on the current smaps checks. > ProcfsBasedProcessTree should not calculate private clean pages > --- > > Key: YARN-4681 > URL: https://issues.apache.org/jira/browse/YARN-4681 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.6.0, 2.7.0 >Reporter: Jan Lukavsky >Assignee: Jan Lukavsky > Attachments: YARN-4681.patch, YARN-4681.patch > > > ProcfsBasedProcessTree in Node manager calculates memory used by a process > tree by parsing {{/etc//smaps}}, where it calculates {{min(Pss, > Shared_Dirty) + Private_Dirty + Private_Clean}}. Because not {{mlocked}} > private clean pages can be reclaimed by kernel, this should be changed to > calculating only {{Locked}} pages instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4681) ProcfsBasedProcessTree should not calculate private clean pages
[ https://issues.apache.org/jira/browse/YARN-4681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174387#comment-15174387 ] Hadoop QA commented on YARN-4681: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 47s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 11s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 17s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: patch generated 3 new + 32 unchanged - 4 fixed = 35 total (was 36) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 51s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 9s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 20m 28s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12787236/YARN-4681.patch | | JIRA Issue | YARN-4681 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 32c7221bb89b 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision
[jira] [Commented] (YARN-4634) Scheduler UI/Metrics need to consider cases like non-queue label mappings
[ https://issues.apache.org/jira/browse/YARN-4634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174436#comment-15174436 ] Hadoop QA commented on YARN-4634: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 57s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 16s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: patch generated 1 new + 55 unchanged - 0 fixed = 56 total (was 55) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 20s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 71m 4s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 72m 1s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 159m 58s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | | Nullcheck of CapacitySchedulerPage$QueuesBlock.nodeLabelsInfo at line 419 of value previously dereferenced in org.apach
[jira] [Commented] (YARN-4671) There is no need to acquire CS lock when completing a container
[ https://issues.apache.org/jira/browse/YARN-4671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174449#comment-15174449 ] Hudson commented on YARN-4671: -- SUCCESS: Integrated in Hadoop-trunk-Commit #9404 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9404/]) YARN-4671. There is no need to acquire CS lock when completing a (jianhe: rev 5c465df90414d43250d09084748ab2d41af44eea) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java > There is no need to acquire CS lock when completing a container > --- > > Key: YARN-4671 > URL: https://issues.apache.org/jira/browse/YARN-4671 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: MENG DING >Assignee: MENG DING > Fix For: 2.8.0 > > Attachments: YARN-4671.1.patch, YARN-4671.2.patch > > > In YARN-4519, we discovered that there is no need to acquire CS lock in > CS#completedContainerInternal, because: > * Access to critical section are already guarded by queue lock. > * It is not essential to guard {{schedulerHealth}} with cs lock in > completedContainerInternal. All maps in schedulerHealth are concurrent maps. > Even if schedulerHealth is not consistent at the moment, it will be > eventually consistent. > With this fix, we can truly claim that CS#allocate doesn't require CS lock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4737) Use CSRF Filter in YARN
[ https://issues.apache.org/jira/browse/YARN-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174555#comment-15174555 ] Hadoop QA commented on YARN-4737: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 42s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 0s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 47s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 8s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 22s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 35s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 40s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 45s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 23s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 52s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 1s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 1s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 37s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 37s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 8s {color} | {color:red} root: patch generated 3 new + 387 unchanged - 0 fixed = 390 total (was 387) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 8m 2s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 38s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 9m 25s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common-jdk1.7.0_95 with JDK v1.7.0_95 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 22s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 56s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 1s {color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {
[jira] [Commented] (YARN-4719) Add a helper library to maintain node state and allows common queries
[ https://issues.apache.org/jira/browse/YARN-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174594#comment-15174594 ] Karthik Kambatla commented on YARN-4719: bq. For ClusterNodeTracker#nodes, can we use lock-free data structure to avoid copying the whole set? Not sure I understand the suggestion. Elaborate? bq. We'd better not add addBlacklistedNodeIdsToList to ClusterNodeTracker since it calls application's logic, we should only include node related stuffs to ClusterNodeTracker. I feel any logic that has to iterate through all nodes should go through ClusterNodeTracker - that way, we don't run into cases where we access the list of nodes without a lock. Alternatively, we could get a list of nodeIDs from ClusterNodeTracker and then look up individual nodes. I am not particular about which approach, but also I don't quite see an issue with it being a part of ClusterNodeTracker. Any particular reason you think this doesn't belong here? > Add a helper library to maintain node state and allows common queries > - > > Key: YARN-4719 > URL: https://issues.apache.org/jira/browse/YARN-4719 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-4719-1.patch, yarn-4719-2.patch > > > The scheduler could use a helper library to maintain node state and allowing > matching/sorting queries. Several reasons for this: > # Today, a lot of the node state management is done separately in each > scheduler. Having a single library will take us that much closer to reducing > duplication among schedulers. > # Adding a filtering/matching API would simplify node labels and locality > significantly. > # An API that returns a sorted list for a custom comparator would help > YARN-1011 where we want to sort by allocation and utilization for > continuous/asynchronous and opportunistic scheduling respectively. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4700) ATS storage has one extra record each time the RM got restarted
[ https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4700: Attachment: YARN-4700-YARN-2928.v1.002.patch Thanks for the comments [~varun_saxena] and [~vrushalic], bq. I believe the constructor for FlowActivityRowKey should change to correctly calculate top of the day timestamp given the input timestamp. As [~varun_saxena] mentioned ??FlowActivityRowKey constructor is used while parsing row key so I don't think we should be changing that?? and without correcting it, tests passed . bq. It might be more explicit to fetch the exact created (or finished) event from the TimelineEntity and use the timestamp that belong to either ApplicationMetricsConstants.CREATED_EVENT_TYPE or I have refactored quiet a bit for this to avoid the looping of events at multiple places. Please check. I have corrected all the other comments by correcting the time stamps > ATS storage has one extra record each time the RM got restarted > --- > > Key: YARN-4700 > URL: https://issues.apache.org/jira/browse/YARN-4700 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Li Lu >Assignee: Naganarasimha G R > Labels: yarn-2928-1st-milestone > Attachments: YARN-4700-YARN-2928.v1.001.patch, > YARN-4700-YARN-2928.v1.002.patch, YARN-4700-YARN-2928.wip.patch > > > When testing the new web UI for ATS v2, I noticed that we're creating one > extra record for each finished application (but still hold in the RM state > store) each time the RM got restarted. It's quite possible that we add the > cluster start timestamp into the default cluster id, thus each time we're > creating a new record for one application (cluster id is a part of the row > key). We need to fix this behavior, probably by having a better default > cluster id. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4700) ATS storage has one extra record each time the RM got restarted
[ https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4700: Attachment: (was: YARN-4700-YARN-2928.v1.002.patch) > ATS storage has one extra record each time the RM got restarted > --- > > Key: YARN-4700 > URL: https://issues.apache.org/jira/browse/YARN-4700 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Li Lu >Assignee: Naganarasimha G R > Labels: yarn-2928-1st-milestone > Attachments: YARN-4700-YARN-2928.v1.001.patch, > YARN-4700-YARN-2928.v1.002.patch, YARN-4700-YARN-2928.wip.patch > > > When testing the new web UI for ATS v2, I noticed that we're creating one > extra record for each finished application (but still hold in the RM state > store) each time the RM got restarted. It's quite possible that we add the > cluster start timestamp into the default cluster id, thus each time we're > creating a new record for one application (cluster id is a part of the row > key). We need to fix this behavior, probably by having a better default > cluster id. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4700) ATS storage has one extra record each time the RM got restarted
[ https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4700: Attachment: YARN-4700-YARN-2928.v1.002.patch > ATS storage has one extra record each time the RM got restarted > --- > > Key: YARN-4700 > URL: https://issues.apache.org/jira/browse/YARN-4700 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Li Lu >Assignee: Naganarasimha G R > Labels: yarn-2928-1st-milestone > Attachments: YARN-4700-YARN-2928.v1.001.patch, > YARN-4700-YARN-2928.v1.002.patch, YARN-4700-YARN-2928.wip.patch > > > When testing the new web UI for ATS v2, I noticed that we're creating one > extra record for each finished application (but still hold in the RM state > store) each time the RM got restarted. It's quite possible that we add the > cluster start timestamp into the default cluster id, thus each time we're > creating a new record for one application (cluster id is a part of the row > key). We need to fix this behavior, probably by having a better default > cluster id. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1040) De-link container life cycle from the process and add ability to execute multiple processes in the same long-lived container
[ https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174775#comment-15174775 ] Vinod Kumar Vavilapalli commented on YARN-1040: --- bq. General case: AM launches multiple containers at the same time. This is essentially container-groups - we should keep this option open. Clarification on what I meant here: It's okay for now to only design APIs (and defer implementation) so that even if our first version of implementation only covers allocation-vs-container delinking, container-groups are possible in future without further API changes/addition. > De-link container life cycle from the process and add ability to execute > multiple processes in the same long-lived container > > > Key: YARN-1040 > URL: https://issues.apache.org/jira/browse/YARN-1040 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 3.0.0 >Reporter: Steve Loughran > > The AM should be able to exec >1 process in a container, rather than have the > NM automatically release the container when the single process exits. > This would let an AM restart a process on the same container repeatedly, > which for HBase would offer locality on a restarted region server. > We may also want the ability to exec multiple processes in parallel, so that > something could be run in the container while a long-lived process was > already running. This can be useful in monitoring and reconfiguring the > long-lived process, as well as shutting it down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent
[ https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-4002: Target Version/s: 2.8.0 > make ResourceTrackerService.nodeHeartbeat more concurrent > - > > Key: YARN-4002 > URL: https://issues.apache.org/jira/browse/YARN-4002 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Critical > Attachments: 0001-YARN-4002.patch, YARN-4002-lockless-read.patch, > YARN-4002-rwlock.patch, YARN-4002-v0.patch > > > We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By > design the method ResourceTrackerService.nodeHeartbeat should be concurrent > enough to scale for large clusters. > But we have a "BIG" lock in NodesListManager.isValidNode which I think it's > unnecessary. > First, the fields "includes" and "excludes" of HostsFileReader are only > updated on "refresh nodes". All RPC threads handling node heartbeats are > only readers. So RWLock could be used to alow concurrent access by RPC > threads. > Second, since he fields "includes" and "excludes" of HostsFileReader are > always updated by "reference assignment", which is atomic in Java, the reader > side lock could just be skipped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent
[ https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174776#comment-15174776 ] Rohith Sharma K S commented on YARN-4002: - [~leftnoteasy] would you like to have look at patch? If no comments I will go ahead with committing it. > make ResourceTrackerService.nodeHeartbeat more concurrent > - > > Key: YARN-4002 > URL: https://issues.apache.org/jira/browse/YARN-4002 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Critical > Attachments: 0001-YARN-4002.patch, YARN-4002-lockless-read.patch, > YARN-4002-rwlock.patch, YARN-4002-v0.patch > > > We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By > design the method ResourceTrackerService.nodeHeartbeat should be concurrent > enough to scale for large clusters. > But we have a "BIG" lock in NodesListManager.isValidNode which I think it's > unnecessary. > First, the fields "includes" and "excludes" of HostsFileReader are only > updated on "refresh nodes". All RPC threads handling node heartbeats are > only readers. So RWLock could be used to alow concurrent access by RPC > threads. > Second, since he fields "includes" and "excludes" of HostsFileReader are > always updated by "reference assignment", which is atomic in Java, the reader > side lock could just be skipped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4752) [Umbrella] FairScheduler: Improve preemption
Karthik Kambatla created YARN-4752: -- Summary: [Umbrella] FairScheduler: Improve preemption Key: YARN-4752 URL: https://issues.apache.org/jira/browse/YARN-4752 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.8.0 Reporter: Karthik Kambatla A number of issues have been reported with respect to preemption in FairScheduler along the lines of: # FairScheduler preempts resources from nodes even if the resultant free resources cannot fit the incoming request. # Preemption doesn't preempt from sibling queues # Preemption doesn't preempt from sibling apps under the same queue that is over its fairshare # ... Filing this umbrella JIRA to group all the issues together and think of a comprehensive solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3997) An Application requesting multiple core containers can't preempt running application made of single core containers
[ https://issues.apache.org/jira/browse/YARN-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3997: --- Issue Type: Sub-task (was: Bug) Parent: YARN-4752 > An Application requesting multiple core containers can't preempt running > application made of single core containers > --- > > Key: YARN-3997 > URL: https://issues.apache.org/jira/browse/YARN-3997 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 2.7.1 > Environment: Ubuntu 14.04, Hadoop 2.7.1, Physical Machines >Reporter: Dan Shechter >Assignee: Arun Suresh >Priority: Critical > > When our cluster is configured with preemption, and is fully loaded with an > application consuming 1-core containers, it will not kill off these > containers when a new application kicks in requesting containers with a size > > 1, for example 4 core containers. > When the "second" application attempts to us 1-core containers as well, > preemption proceeds as planned and everything works properly. > It is my assumption, that the fair-scheduler, while recognizing it needs to > kill off some container to make room for the new application, fails to find a > SINGLE container satisfying the request for a 4-core container (since all > existing containers are 1-core containers), and isn't "smart" enough to > realize it needs to kill off 4 single-core containers (in this case) on a > single node, for the new application to be able to proceed... > The exhibited affect is that the new application is hung indefinitely and > never gets the resources it requires. > This can easily be replicated with any yarn application. > Our "goto" scenario in this case is running pyspark with 1-core executors > (containers) while trying to launch h20.ai framework which INSISTS on having > at least 4 cores per container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent
[ https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174795#comment-15174795 ] Hong Zhiguo commented on YARN-4002: --- Hi, [~rohithsharma], thanks for the refinement. But why don't take the lockless version? > make ResourceTrackerService.nodeHeartbeat more concurrent > - > > Key: YARN-4002 > URL: https://issues.apache.org/jira/browse/YARN-4002 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Critical > Attachments: 0001-YARN-4002.patch, YARN-4002-lockless-read.patch, > YARN-4002-rwlock.patch, YARN-4002-v0.patch > > > We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By > design the method ResourceTrackerService.nodeHeartbeat should be concurrent > enough to scale for large clusters. > But we have a "BIG" lock in NodesListManager.isValidNode which I think it's > unnecessary. > First, the fields "includes" and "excludes" of HostsFileReader are only > updated on "refresh nodes". All RPC threads handling node heartbeats are > only readers. So RWLock could be used to alow concurrent access by RPC > threads. > Second, since he fields "includes" and "excludes" of HostsFileReader are > always updated by "reference assignment", which is atomic in Java, the reader > side lock could just be skipped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3405) FairScheduler's preemption cannot happen between sibling in some case
[ https://issues.apache.org/jira/browse/YARN-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3405: --- Issue Type: Sub-task (was: Bug) Parent: YARN-4752 > FairScheduler's preemption cannot happen between sibling in some case > - > > Key: YARN-3405 > URL: https://issues.apache.org/jira/browse/YARN-3405 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: Peng Zhang >Assignee: Peng Zhang >Priority: Critical > Labels: BB2015-05-TBR > Attachments: YARN-3405.01.patch, YARN-3405.02.patch > > > Queue hierarchy described as below: > {noformat} > root >/ \ >queue-1 queue-2 > / \ > queue-1-1 queue-1-2 > {noformat} > Assume cluster resource is 100 > # queue-1-1 and queue-2 has app. Each get 50 usage and 50 fairshare. > # When queue-1-2 is active, and it cause some new preemption request for > fairshare 25. > # When preemption from root, it has possibility to find preemption candidate > is queue-2. If so preemptContainerPreCheck for queue-2 return false because > it's equal to its fairshare. > # Finally queue-1-2 will be waiting for resource release form queue-1-1 > itself. > What I expect here is that queue-1-2 preempt from queue-1-1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4333) Fair scheduler should support preemption within queue
[ https://issues.apache.org/jira/browse/YARN-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-4333: --- Issue Type: Sub-task (was: Improvement) Parent: YARN-4752 > Fair scheduler should support preemption within queue > - > > Key: YARN-4333 > URL: https://issues.apache.org/jira/browse/YARN-4333 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 2.6.0 >Reporter: Tao Jie >Assignee: Tao Jie > Attachments: YARN-4333.001.patch, YARN-4333.002.patch, > YARN-4333.003.patch > > > Now each app in fair scheduler is allocated its fairshare, however fairshare > resource is not ensured even if fairSharePreemption is enabled. > Consider: > 1, When the cluster is idle, we submit app1 to queueA,which takes maxResource > of queueA. > 2, Then the cluster becomes busy, but app1 does not release any resource, > queueA resource usage is over its fairshare > 3, Then we submit app2(maybe with higher priority) to queueA. Now app2 has > its own fairshare, but could not obtain any resource, since queueA is still > over its fairshare and resource will not assign to queueA anymore. Also, > preemption is not triggered in this case. > So we should allow preemption within queue, when app is starved for fairshare. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2154) FairScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request
[ https://issues.apache.org/jira/browse/YARN-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2154: --- Issue Type: Sub-task (was: Improvement) Parent: YARN-4752 > FairScheduler: Improve preemption to preempt only those containers that would > satisfy the incoming request > -- > > Key: YARN-2154 > URL: https://issues.apache.org/jira/browse/YARN-2154 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 2.4.0 >Reporter: Karthik Kambatla >Assignee: Arun Suresh >Priority: Critical > Attachments: YARN-2154.1.patch > > > Today, FairScheduler uses a spray-gun approach to preemption. Instead, it > should only preempt resources that would satisfy the incoming request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4120) FSAppAttempt.getResourceUsage() should not take preemptedResource into account
[ https://issues.apache.org/jira/browse/YARN-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-4120: --- Issue Type: Sub-task (was: Bug) Parent: YARN-4752 > FSAppAttempt.getResourceUsage() should not take preemptedResource into account > -- > > Key: YARN-4120 > URL: https://issues.apache.org/jira/browse/YARN-4120 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Xianyin Xin > > When compute resource usage for Schedulables, the following code is envolved, > {{FSAppAttempt.getResourceUsage}}, > {code} > public Resource getResourceUsage() { > return Resources.subtract(getCurrentConsumption(), getPreemptedResources()); > } > {code} > and this value is aggregated to FSLeafQueues and FSParentQueues. In my > opinion, taking {{preemptedResource}} into account here is not reasonable, > there are two main reasons, > # it is something in future, i.e., even though these resources are marked as > preempted, it is currently used by app, and these resources will be > subtracted from {{currentCosumption}} once the preemption is finished. it's > not reasonable to make arrange for it ahead of time. > # there's another problem here, consider following case, > {code} > root >/\ > queue1 queue2 > /\ > queue1.3, queue1.4 > {code} > suppose queue1.3 need resource and it can preempt resources from queue1.4, > the preemption happens in the interior of queue1. But when compute resource > usage of queue1, {{queue1.resourceUsage = it's_current_resource_usage - > preemption}} according to the current code, which is unfair to queue2 when > doing resource allocating. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4134) FairScheduler preemption stops at queue level that all child queues are not over their fairshare
[ https://issues.apache.org/jira/browse/YARN-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-4134: --- Issue Type: Sub-task (was: Bug) Parent: YARN-4752 > FairScheduler preemption stops at queue level that all child queues are not > over their fairshare > > > Key: YARN-4134 > URL: https://issues.apache.org/jira/browse/YARN-4134 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Xianyin Xin >Assignee: Xianyin Xin > Attachments: YARN-4134.001.patch, YARN-4134.002.patch, > YARN-4134.003.patch > > > Now FairScheudler uses a choose-a-candidate method to select a container from > leaf queues that to be preempted, in {{FSParentQueue.preemptContainer()}}, > {code} > readLock.lock(); > try { > for (FSQueue queue : childQueues) { > if (candidateQueue == null || > comparator.compare(queue, candidateQueue) > 0) { > candidateQueue = queue; > } > } > } finally { > readLock.unlock(); > } > // Let the selected queue choose which of its container to preempt > if (candidateQueue != null) { > toBePreempted = candidateQueue.preemptContainer(); > } > {code} > a candidate child queue is selected. However, if the queue's usage isn't over > it's fairshare, preemption will not happen: > {code} > if (!preemptContainerPreCheck()) { > return toBePreempted; > } > {code} > A scenario: > {code} > root >/\ > queue1 queue2 >/\ > queue2.3, ( queue2.4 ) > {code} > suppose there're 8 containers, and queues at any level have the same weight. > queue1 takes 4 and queue2.3 takes 4, so both queue1 and queue2 are at their > fairshare. Now we submit an app in queue2.4 with 4 containers needs, it > should preempt 2 from queue2.3, but the candidate-containers selection > procedure will stop at queue1, so none of the containers will be preempted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3902) Fair scheduler preempts ApplicationMaster
[ https://issues.apache.org/jira/browse/YARN-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3902: --- Issue Type: Sub-task (was: Bug) Parent: YARN-4752 > Fair scheduler preempts ApplicationMaster > - > > Key: YARN-3902 > URL: https://issues.apache.org/jira/browse/YARN-3902 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 2.3.0 > Environment: 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt2-1~bpo70+1 > (2014-12-08) x86_64 >Reporter: He Tianyi >Assignee: He Tianyi > Original Estimate: 72h > Remaining Estimate: 72h > > YARN-2022 have fixed the similar issue related to CapacityScheduler. > However, FairScheduler still suffer, preempting AM while other normal > containers running out there. > I think we should take the same approach, avoid AM being preempted unless > there is no container running other than AM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4133) Containers to be preempted leak in FairScheduler preemption logic.
[ https://issues.apache.org/jira/browse/YARN-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-4133: --- Issue Type: Sub-task (was: Bug) Parent: YARN-4752 > Containers to be preempted leak in FairScheduler preemption logic. > -- > > Key: YARN-4133 > URL: https://issues.apache.org/jira/browse/YARN-4133 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-4133.000.patch > > > Containers to be preempted leak in FairScheduler preemption logic. It may > cause missing preemption due to containers in {{warnedContainers}} wrongly > removed. The problem is in {{preemptResources}}: > There are two issues which can cause containers wrongly removed from > {{warnedContainers}}: > Firstly missing the container state {{RMContainerState.ACQUIRED}} in the > condition check: > {code} > (container.getState() == RMContainerState.RUNNING || > container.getState() == RMContainerState.ALLOCATED) > {code} > Secondly if {{isResourceGreaterThanNone(toPreempt)}} return false, we > shouldn't remove container from {{warnedContainers}}. We should only remove > container from {{warnedContainers}}, if container is not in state > {{RMContainerState.RUNNING}}, {{RMContainerState.ALLOCATED}} and > {{RMContainerState.ACQUIRED}}. > {code} > if ((container.getState() == RMContainerState.RUNNING || > container.getState() == RMContainerState.ALLOCATED) && > isResourceGreaterThanNone(toPreempt)) { > warnOrKillContainer(container); > Resources.subtractFrom(toPreempt, > container.getContainer().getResource()); > } else { > warnedIter.remove(); > } > {code} > Also once the containers in {{warnedContainers}} are wrongly removed, it will > never be preempted. Because these containers are already in > {{FSAppAttempt#preemptionMap}} and {{FSAppAttempt#preemptContainer}} won't > return the containers in {{FSAppAttempt#preemptionMap}}. > {code} > public RMContainer preemptContainer() { > if (LOG.isDebugEnabled()) { > LOG.debug("App " + getName() + " is going to preempt a running " + > "container"); > } > RMContainer toBePreempted = null; > for (RMContainer container : getLiveContainers()) { > if (!getPreemptionContainers().contains(container) && > (toBePreempted == null || > comparator.compare(toBePreempted, container) > 0)) { > toBePreempted = container; > } > } > return toBePreempted; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1961) Fair scheduler preemption doesn't work for non-leaf queues
[ https://issues.apache.org/jira/browse/YARN-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1961: --- Issue Type: Sub-task (was: Bug) Parent: YARN-4752 > Fair scheduler preemption doesn't work for non-leaf queues > -- > > Key: YARN-1961 > URL: https://issues.apache.org/jira/browse/YARN-1961 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler, scheduler >Affects Versions: 2.4.0 >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Labels: scheduler > > Setting minResources and minSharePreemptionTimeout to a non-leaf queue > doesn't cause preemption to happen when that non-leaf queue is below > minResources and there are outstanding demands in that non-leaf queue. > Here is an example fs allocation config(partial) : > {code:xml} > > 3072 mb,0 vcores > 30 > > > > > > {code} > With the above configs,preemption doesn't seem to happen if queue abc is > below minShare and it has outstanding unsatisfied demands from apps in its > child queues. Ideally in such cases we would like preemption to kick off and > reclaim resources from other queues(not under queue abc). > Looking at the code it seems like preemption checks for starvation only at > the leaf queue level and not at the parent level. > {code:title=FairScheduler.java|borderStyle=solid} > boolean isStarvedForMinShare(FSLeafQueue sched) > boolean isStarvedForFairShare(FSLeafQueue sched) > {code} > This affects our use case where we have a parent queue with probably a 100 > unconfigured leaf queues under it.We want to give a minshare to the parent > queue to protect all the leaf queues under it,but we cannot do it due to this > bug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3121) FairScheduler preemption metrics
[ https://issues.apache.org/jira/browse/YARN-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3121: --- Issue Type: Sub-task (was: Improvement) Parent: YARN-4752 > FairScheduler preemption metrics > > > Key: YARN-3121 > URL: https://issues.apache.org/jira/browse/YARN-3121 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: yARN-3121.prelim.patch, yARN-3121.prelim.patch > > > Add FSQueuemetrics for preemption related information -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3414) FairScheduler's preemption may cause livelock
[ https://issues.apache.org/jira/browse/YARN-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3414: --- Issue Type: Sub-task (was: Bug) Parent: YARN-4752 > FairScheduler's preemption may cause livelock > - > > Key: YARN-3414 > URL: https://issues.apache.org/jira/browse/YARN-3414 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 2.6.0 >Reporter: Peng Zhang > > I met this problem in our cluster, it cause livelock during preemption and > scheduling. > Queue hierarchy described as below: > {noformat} > root > /|\ > queue-1queue-2queue-3 > /\ > queue-1-1 queue-1-2 > {noformat} > # Assume cluster resource is 100G in memory > # Assume queue-1 has max resource limit 20G > # queue-1-1 is active and it will get max 20G memory(equal to its fairshare) > # queue-2 is active then, and it require 30G memory(less than its fairshare) > # queue-3 is active, and it can be assigned with all other resources, 50G > memory(larger than its fairshare). At here three queues' fair share is (20, > 40, 40), and usage is (20, 30, 50) > # queue-1-2 is active, it will cause new preemption request(10G memory and > intuitively it can only preempt from its sibling queue-1-1) > # Actually preemption starts from root, and it will find queue-3 is most over > fairshare, and preempt some resources form queue-3. > # But during scheduling, it will find queue-1 itself arrived it's max > fairshare, and cannot assign resource to it. Then resource's again assigned > to queue-3 > And then it repeats between last two steps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3054) Preempt policy in FairScheduler may cause mapreduce job never finish
[ https://issues.apache.org/jira/browse/YARN-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3054: --- Issue Type: Sub-task (was: Bug) Parent: YARN-4752 > Preempt policy in FairScheduler may cause mapreduce job never finish > > > Key: YARN-3054 > URL: https://issues.apache.org/jira/browse/YARN-3054 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 2.6.0 >Reporter: Peng Zhang > > Preemption policy is related with schedule policy now. Using comparator of > schedule policy to find preemption candidate cannot guarantee a subset of > containers never be preempted. And this may cause tasks to be preempted > periodically before they finish. So job cannot make any progress. > I think preemption in YARN should got below assurance: > 1. Mapreduce jobs can get additional resources when others are idle; > 2. Mapreduce jobs for one user in one queue can still progress with its min > share when others preempt resources back. > Maybe always preempt the latest app and container can get this? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4661) Per-queue preemption policy in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-4661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-4661: --- Issue Type: Sub-task (was: Wish) Parent: YARN-4752 > Per-queue preemption policy in FairScheduler > > > Key: YARN-4661 > URL: https://issues.apache.org/jira/browse/YARN-4661 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 2.6.0 >Reporter: He Tianyi >Priority: Minor > > When {{FairScheduler}} needs to preempt container, it tries to find a > container by hierachically sorting and selecting {{AppSchedulable}} with most > 'over fairshare' (in {{FairSharePolicy}}), and pick its latest launched > container. > In some case, strategy above become non-optimal, one may want to kill latest > container (not {{AppSchedulable}}) launched in the queue for better trade-off > between fairness and efficiency. Since most app with over fairshare tend to > be started longer ago than other apps, perhaps even its latent launch > container is running quite some time. > Maybe besides {{policy}}, we make it possible to also specify a > {{preemptionPolicy}} only for selecting container to preempt, without > changing scheduling policy. > For example: > {quote} > > fifo > fair > > {quote} > Any suggestions or comments? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3903) Disable preemption at Queue level for Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-3903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3903: --- Issue Type: Sub-task (was: Improvement) Parent: YARN-4752 > Disable preemption at Queue level for Fair Scheduler > > > Key: YARN-3903 > URL: https://issues.apache.org/jira/browse/YARN-3903 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.7.0 > Environment: 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt2-1~bpo70+1 > (2014-12-08) x86_64 >Reporter: He Tianyi >Priority: Trivial > Attachments: YARN-3093.1.patch, YARN-3093.2.patch > > Original Estimate: 72h > Remaining Estimate: 72h > > YARN-2056 supports disabling preemption at queue level for CapacityScheduler. > As for fair scheduler, we recently encountered the same need. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent
[ https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174833#comment-15174833 ] Rohith Sharma K S commented on YARN-4002: - All the method in HostsFileReader are synchronized. And method {{isValidNode}} does 2 separate calls to hostsReader. There could be a scenario (if lock is not used) where after executing {{hostsReader.getHosts();}} , hostreader can do refresh which gives updated result for {{hostsReader.getExcludedHosts();}} but stale host details for getHosts method. Lockless read might mix up old and new values which is incorrect. > make ResourceTrackerService.nodeHeartbeat more concurrent > - > > Key: YARN-4002 > URL: https://issues.apache.org/jira/browse/YARN-4002 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Critical > Attachments: 0001-YARN-4002.patch, YARN-4002-lockless-read.patch, > YARN-4002-rwlock.patch, YARN-4002-v0.patch > > > We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By > design the method ResourceTrackerService.nodeHeartbeat should be concurrent > enough to scale for large clusters. > But we have a "BIG" lock in NodesListManager.isValidNode which I think it's > unnecessary. > First, the fields "includes" and "excludes" of HostsFileReader are only > updated on "refresh nodes". All RPC threads handling node heartbeats are > only readers. So RWLock could be used to alow concurrent access by RPC > threads. > Second, since he fields "includes" and "excludes" of HostsFileReader are > always updated by "reference assignment", which is atomic in Java, the reader > side lock could just be skipped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3903) Disable preemption at Queue level for Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-3903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174852#comment-15174852 ] Hadoop QA commented on YARN-3903: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} | {color:red} YARN-3903 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12746093/YARN-3093.2.patch | | JIRA Issue | YARN-3903 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/10681/console | | Powered by | Apache Yetus 0.3.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Disable preemption at Queue level for Fair Scheduler > > > Key: YARN-3903 > URL: https://issues.apache.org/jira/browse/YARN-3903 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.7.0 > Environment: 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt2-1~bpo70+1 > (2014-12-08) x86_64 >Reporter: He Tianyi >Priority: Trivial > Attachments: YARN-3093.1.patch, YARN-3093.2.patch > > Original Estimate: 72h > Remaining Estimate: 72h > > YARN-2056 supports disabling preemption at queue level for CapacityScheduler. > As for fair scheduler, we recently encountered the same need. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4133) Containers to be preempted leak in FairScheduler preemption logic.
[ https://issues.apache.org/jira/browse/YARN-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174866#comment-15174866 ] Hadoop QA commented on YARN-4133: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 3s {color} | {color:red} YARN-4133 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12754810/YARN-4133.000.patch | | JIRA Issue | YARN-4133 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/10684/console | | Powered by | Apache Yetus 0.3.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Containers to be preempted leak in FairScheduler preemption logic. > -- > > Key: YARN-4133 > URL: https://issues.apache.org/jira/browse/YARN-4133 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-4133.000.patch > > > Containers to be preempted leak in FairScheduler preemption logic. It may > cause missing preemption due to containers in {{warnedContainers}} wrongly > removed. The problem is in {{preemptResources}}: > There are two issues which can cause containers wrongly removed from > {{warnedContainers}}: > Firstly missing the container state {{RMContainerState.ACQUIRED}} in the > condition check: > {code} > (container.getState() == RMContainerState.RUNNING || > container.getState() == RMContainerState.ALLOCATED) > {code} > Secondly if {{isResourceGreaterThanNone(toPreempt)}} return false, we > shouldn't remove container from {{warnedContainers}}. We should only remove > container from {{warnedContainers}}, if container is not in state > {{RMContainerState.RUNNING}}, {{RMContainerState.ALLOCATED}} and > {{RMContainerState.ACQUIRED}}. > {code} > if ((container.getState() == RMContainerState.RUNNING || > container.getState() == RMContainerState.ALLOCATED) && > isResourceGreaterThanNone(toPreempt)) { > warnOrKillContainer(container); > Resources.subtractFrom(toPreempt, > container.getContainer().getResource()); > } else { > warnedIter.remove(); > } > {code} > Also once the containers in {{warnedContainers}} are wrongly removed, it will > never be preempted. Because these containers are already in > {{FSAppAttempt#preemptionMap}} and {{FSAppAttempt#preemptContainer}} won't > return the containers in {{FSAppAttempt#preemptionMap}}. > {code} > public RMContainer preemptContainer() { > if (LOG.isDebugEnabled()) { > LOG.debug("App " + getName() + " is going to preempt a running " + > "container"); > } > RMContainer toBePreempted = null; > for (RMContainer container : getLiveContainers()) { > if (!getPreemptionContainers().contains(container) && > (toBePreempted == null || > comparator.compare(toBePreempted, container) > 0)) { > toBePreempted = container; > } > } > return toBePreempted; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3405) FairScheduler's preemption cannot happen between sibling in some case
[ https://issues.apache.org/jira/browse/YARN-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174865#comment-15174865 ] Hadoop QA commented on YARN-3405: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s {color} | {color:red} YARN-3405 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12727554/YARN-3405.02.patch | | JIRA Issue | YARN-3405 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/10683/console | | Powered by | Apache Yetus 0.3.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > FairScheduler's preemption cannot happen between sibling in some case > - > > Key: YARN-3405 > URL: https://issues.apache.org/jira/browse/YARN-3405 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: Peng Zhang >Assignee: Peng Zhang >Priority: Critical > Labels: BB2015-05-TBR > Attachments: YARN-3405.01.patch, YARN-3405.02.patch > > > Queue hierarchy described as below: > {noformat} > root >/ \ >queue-1 queue-2 > / \ > queue-1-1 queue-1-2 > {noformat} > Assume cluster resource is 100 > # queue-1-1 and queue-2 has app. Each get 50 usage and 50 fairshare. > # When queue-1-2 is active, and it cause some new preemption request for > fairshare 25. > # When preemption from root, it has possibility to find preemption candidate > is queue-2. If so preemptContainerPreCheck for queue-2 return false because > it's equal to its fairshare. > # Finally queue-1-2 will be waiting for resource release form queue-1-1 > itself. > What I expect here is that queue-1-2 preempt from queue-1-1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174870#comment-15174870 ] wanglei-it commented on YARN-1506: -- Thanks for your reply. > Replace set resource change on RMNode/SchedulerNode directly with event > notification. > - > > Key: YARN-1506 > URL: https://issues.apache.org/jira/browse/YARN-1506 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-1506-v1.patch, YARN-1506-v10.patch, > YARN-1506-v11.patch, YARN-1506-v12.patch, YARN-1506-v13.patch, > YARN-1506-v14.patch, YARN-1506-v15.patch, YARN-1506-v16.patch, > YARN-1506-v17.patch, YARN-1506-v2.patch, YARN-1506-v3.patch, > YARN-1506-v4.patch, YARN-1506-v5.patch, YARN-1506-v6.patch, > YARN-1506-v7.patch, YARN-1506-v8.patch, YARN-1506-v9.patch > > > According to Vinod's comments on YARN-312 > (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087), > we should replace RMNode.setResourceOption() with some resource change event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4478) [Umbrella] : Track all the Test failures in YARN
[ https://issues.apache.org/jira/browse/YARN-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174874#comment-15174874 ] Rohith Sharma K S commented on YARN-4478: - Consistently few test cases are failing because of UnknowHostException. Detailed analysis given in HADOOP-12687. This is mainly because of yarn-precommit build machine hostname. I have raised INFRA JIRA INFRA-11150 to change YARN precommit build machine hostname. There were no response from INFRA team. Could any folks/PMC's knows whom to contact for resolving INFRA-11150? > [Umbrella] : Track all the Test failures in YARN > > > Key: YARN-4478 > URL: https://issues.apache.org/jira/browse/YARN-4478 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Rohith Sharma K S > > Recently many test cases are failing either timed out or new bug fix caused > impact. Many test faiures JIRA are raised and are in progress. > This is to track all the test failures JIRA's -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4478) [Umbrella] : Track all the Test failures in YARN
[ https://issues.apache.org/jira/browse/YARN-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174888#comment-15174888 ] Allen Wittenauer commented on YARN-4478: * There are plenty of examples where the Jenkins network connectivity fails, which of course would also cause DNS failures... * Changing the Jenkins servers isn't likely to fix anything given that all of the tests run in a docker container that the Hadoop project itself controls. > [Umbrella] : Track all the Test failures in YARN > > > Key: YARN-4478 > URL: https://issues.apache.org/jira/browse/YARN-4478 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Rohith Sharma K S > > Recently many test cases are failing either timed out or new bug fix caused > impact. Many test faiures JIRA are raised and are in progress. > This is to track all the test failures JIRA's -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4712) CPU Usage Metric is not captured properly in YARN-2928
[ https://issues.apache.org/jira/browse/YARN-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4712: Attachment: YARN-4712-YARN-2928.v1.002.patch Hi [~varun_saxena], bq. I was incorrectly assuming that CPU % is reported to NMTimelinePublisher in the range of 0-1. This doesn't seem to be the case though. You are right even i tested it i was able to to get more than cores as percentage. so multiplying with 100 is not required and even i felt *round* is better than floor have incorporated the required changes. bq. 2 of the checkstyle issues seem fixable. Well have corrected it but i generally use the eclipse formatter which follows the sun conventions as mentioned in the [Hadoop wiki|https://wiki.apache.org/hadoop/HowToContribute] , so usually eclipse formatter takes care of 80 chars per line effectively where ever possible but is required to do additionally apart from it? cc/ [~sjlee0] > CPU Usage Metric is not captured properly in YARN-2928 > -- > > Key: YARN-4712 > URL: https://issues.apache.org/jira/browse/YARN-4712 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Labels: yarn-2928-1st-milestone > Attachments: YARN-4712-YARN-2928.v1.001.patch, > YARN-4712-YARN-2928.v1.002.patch > > > There are 2 issues with CPU usage collection > * I was able to observe that that many times CPU usage got from > {{pTree.getCpuUsagePercent()}} is > ResourceCalculatorProcessTree.UNAVAILABLE(i.e. -1) but ContainersMonitor do > the calculation i.e. {{cpuUsageTotalCoresPercentage = cpuUsagePercentPerCore > /resourceCalculatorPlugin.getNumProcessors()}} because of which UNAVAILABLE > check in {{NMTimelinePublisher.reportContainerResourceUsage}} is not > encountered. so proper checks needs to be handled > * {{EntityColumnPrefix.METRIC}} uses always LongConverter but > ContainerMonitor is publishing decimal values for the CPU usage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4478) [Umbrella] : Track all the Test failures in YARN
[ https://issues.apache.org/jira/browse/YARN-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174968#comment-15174968 ] Rohith Sharma K S commented on YARN-4478: - In analysis of HADOOP-12687, test failures are not because of network connectivity which makes DNS server down. Hadoop security model obey to RFC standards. In [comment|https://issues.apache.org/jira/browse/HADOOP-12687?focusedCommentId=15087185&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15087185], Varun talks about RFC 1535. This RFC says hostname must be end with dot("."). But jenkins machines does hostname are not configured with RFC which is causing test failures. As a test, this is able to reproduce in Ubuntu. After changing hostname ending with dot("."), these test cases are passing. This change wants to bring in YARN-precommit build machine too. > [Umbrella] : Track all the Test failures in YARN > > > Key: YARN-4478 > URL: https://issues.apache.org/jira/browse/YARN-4478 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Rohith Sharma K S > > Recently many test cases are failing either timed out or new bug fix caused > impact. Many test faiures JIRA are raised and are in progress. > This is to track all the test failures JIRA's -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4478) [Umbrella] : Track all the Test failures in YARN
[ https://issues.apache.org/jira/browse/YARN-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174995#comment-15174995 ] Allen Wittenauer commented on YARN-4478: You realize that RFC is talking about DNS and not /etc/hosts, right? It's specifically to prevent the DNS resolver from adding more domains during resolution. Also, in 20+ years of Unix system administration, I have never configured /etc/hosts with an ending period. That's because /etc/hosts resolution isn't supposed to go through the DNS resolver at all. > [Umbrella] : Track all the Test failures in YARN > > > Key: YARN-4478 > URL: https://issues.apache.org/jira/browse/YARN-4478 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Rohith Sharma K S > > Recently many test cases are failing either timed out or new bug fix caused > impact. Many test faiures JIRA are raised and are in progress. > This is to track all the test failures JIRA's -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4478) [Umbrella] : Track all the Test failures in YARN
[ https://issues.apache.org/jira/browse/YARN-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175002#comment-15175002 ] Rohith Sharma K S commented on YARN-4478: - bq. Also, in 20+ years of Unix system administration, I have never configured /etc/hosts with an ending period. That's because /etc/hosts resolution isn't supposed to go through the DNS resolver at all. Cool.. Then do you think original patch of HADOOP-12687 can go in? > [Umbrella] : Track all the Test failures in YARN > > > Key: YARN-4478 > URL: https://issues.apache.org/jira/browse/YARN-4478 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Rohith Sharma K S > > Recently many test cases are failing either timed out or new bug fix caused > impact. Many test faiures JIRA are raised and are in progress. > This is to track all the test failures JIRA's -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4700) ATS storage has one extra record each time the RM got restarted
[ https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175020#comment-15175020 ] Hadoop QA commented on YARN-4700: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 4 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 54s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 49s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 30s {color} | {color:green} YARN-2928 passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 10s {color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 6s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 17s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 30s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 26s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 20s {color} | {color:green} YARN-2928 passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 8s {color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 5s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 2s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 2s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 5s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 5s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 7s {color} | {color:red} root: patch generated 1 new + 80 unchanged - 0 fixed = 81 total (was 80) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 56s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 15s {color} | {color:red} hadoop-yarn-server-timelineservice in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 4m 49s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice-jdk1.7.0_95 with JDK v1.7.0_95 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 3s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 53m 26s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 1s {color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 49m 38s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 8s {color} | {col
[jira] [Commented] (YARN-4712) CPU Usage Metric is not captured properly in YARN-2928
[ https://issues.apache.org/jira/browse/YARN-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175029#comment-15175029 ] Hadoop QA commented on YARN-4712: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 22s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} YARN-2928 passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 57s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s {color} | {color:green} YARN-2928 passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s {color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: patch generated 1 new + 22 unchanged - 2 fixed = 23 total (was 24) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 7s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 46s {color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 14s {color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 35m 27s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12790851/YARN-4712-YARN-2928.v1.002.patch | | JIRA Issue | YARN-4712 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux c78f28d74503 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Pe
[jira] [Created] (YARN-4753) Use doxia macro to generate in-page TOC of YARN site documentation
Masatake Iwasaki created YARN-4753: -- Summary: Use doxia macro to generate in-page TOC of YARN site documentation Key: YARN-4753 URL: https://issues.apache.org/jira/browse/YARN-4753 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.7.0 Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Since maven-site-plugin 3.5 was releaced, we can use toc macro in Markdown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4753) Use doxia macro to generate in-page TOC of YARN site documentation
[ https://issues.apache.org/jira/browse/YARN-4753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated YARN-4753: --- Description: Since maven-site-plugin 3.5 was released, we can use toc macro in Markdown. (was: Since maven-site-plugin 3.5 was releaced, we can use toc macro in Markdown.) > Use doxia macro to generate in-page TOC of YARN site documentation > -- > > Key: YARN-4753 > URL: https://issues.apache.org/jira/browse/YARN-4753 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Affects Versions: 2.7.0 >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki > > Since maven-site-plugin 3.5 was released, we can use toc macro in Markdown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4750) App metrics may not be correct when an app is recovered
[ https://issues.apache.org/jira/browse/YARN-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175105#comment-15175105 ] Srikanth Sampath commented on YARN-4750: Agree [~jianhe]that it would be expensive to do update periodically. However, it will be useful to indicate that the metrics are compromised. One option can be to set the value to a special value (say a negative number) so as to to indicate a compromised value one time. Just carrying on silently, can be misleading. > App metrics may not be correct when an app is recovered > --- > > Key: YARN-4750 > URL: https://issues.apache.org/jira/browse/YARN-4750 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > > App metrics(rather app attempt metrics) like Vcore-seconds and MB-seconds are > saved in the state store when there is an attempt state transition. Values > for running attempts will be in memory and will not be saved when there is an > RM restart/failover. For recovered app metrics value will be reset. In that > case, these values will be incomplete. > Was this intentional or have we not found a correct way to fix it? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4744) Too many signal to container failure in case of LCE
[ https://issues.apache.org/jira/browse/YARN-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175180#comment-15175180 ] Sidharta Seethana commented on YARN-4744: - /cc [~vvasudev] It looks like this is an artifact of existing NM behavior - the NM appears to signal containers that have already exited ( as a part of {{ContainerLaunch.cleanupContainer()}} ) . This signal operation fails because the process has already exited. These failures were not logged before but they are being logged now because of the centralization of container-executor operations via {{PrivilegedOperationExecutor}} - which logs all container-executor failures. > Too many signal to container failure in case of LCE > --- > > Key: YARN-4744 > URL: https://issues.apache.org/jira/browse/YARN-4744 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0 >Reporter: Bibin A Chundatt >Assignee: Sidharta Seethana > > Install HA cluster in secure mode > Enable LCE with cgroups > Start server with dsperf user > Submit mapreduce application terasort/teragen with user yarn/dsperf > Too many signal to container failure > Submit with user the exception is thrown > {noformat} > 2014-03-02 09:20:38,689 INFO > SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: > Authorization successful for testing (auth:TOKEN) for protocol=interface > org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB > 2014-03-02 09:20:40,158 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Event EventType: KILL_CONTAINER sent to absent container > container_e02_1393731146548_0001_01_13 > 2014-03-02 09:20:43,071 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: > Container container_e02_1393731146548_0001_01_09 succeeded > 2014-03-02 09:20:43,072 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_e02_1393731146548_0001_01_09 transitioned from > RUNNING to EXITED_WITH_SUCCESS > 2014-03-02 09:20:43,073 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: > Cleaning up container container_e02_1393731146548_0001_01_09 > 2014-03-02 09:20:43,075 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime: > Using container runtime: DefaultLinuxContainerRuntime > 2014-03-02 09:20:43,081 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor: > Shell execution returned exit code: 9. Privileged Execution Operation Output: > main : command provided 2 > main : run as user is yarn > main : requested yarn user is yarn > Full command array for failed execution: > [/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor, > yarn, yarn, 2, 9370, 15] > 2014-03-02 09:20:43,081 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime: > Signal container failed. Exception: > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: > ExitCodeException exitCode=9: > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:132) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:109) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:513) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java:520) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:139) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:55) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) > at java.lang.Thread.run(Thread.java:745) > Caused by: ExitCodeException exitCode=9: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:927) > at org.apache.hadoop.util.Shell.run(Shell.java:838) > at > org.apache.hadoop.util.Sh