[jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking
[ https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400996#comment-15400996 ] Hadoop QA commented on YARN-4676: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 6 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 43s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 43s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 52s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 33s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 43s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 51s {color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s {color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 40s {color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 47s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 6m 47s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 47s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 34s {color} | {color:red} root: The patch generated 15 new + 678 unchanged - 6 fixed = 693 total (was 684) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s {color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 48s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 21s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 2 new + 963 unchanged - 0 fixed = 965 total (was 963) {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 10s {color} | {color:green} hadoop-project in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 2s {color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s {color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 16s {color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 37m 40s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 59s {color} | {color:red} hadoop-yarn-client in the patch failed. {co
[jira] [Comment Edited] (YARN-5455) LinuxContainerExecutor needs Javadocs
[ https://issues.apache.org/jira/browse/YARN-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400972#comment-15400972 ] Allen Wittenauer edited comment on YARN-5455 at 7/31/16 5:54 AM: - bq. This class provides execution for Linux, trading broader platform support for the ability to take advantage of the capabilities of Linux. This isn't actually correct. Other than cgroups (which are optional), there's nothing here actually Linux specific. It runs and has run fine on non-Linux other than Windows for years and years. You might be thinking of LinuxResourceCalculatorPlugin which most definitely is not portable. was (Author: aw): bq. This class provides execution for Linux, trading broader platform support for the ability to take advantage of the capabilities of Linux. This isn't actually correct. Other than cgroups (which are optional), there's nothing here actually Linux specific. It runs and has run fine on non-Linux other than Windows for years and years. > LinuxContainerExecutor needs Javadocs > - > > Key: YARN-5455 > URL: https://issues.apache.org/jira/browse/YARN-5455 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.8.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > Attachments: YARN-5455.001.patch > > > 'Nuff said. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5261) Lease/Reclaim Extension to Yarn
[ https://issues.apache.org/jira/browse/YARN-5261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu updated YARN-5261: - Attachment: YARN-5261-5.patch > Lease/Reclaim Extension to Yarn > --- > > Key: YARN-5261 > URL: https://issues.apache.org/jira/browse/YARN-5261 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: Yu > Attachments: YARN-5261-1.patch, YARN-5261-2.patch, YARN-5261-3.patch, > YARN-5261-4.patch, YARN-5261-5.patch, Yarn-5261.pdf > > > In some clusters outside of Yarn, the machines resources are not fully > utilized, e.g., resource usage is quite low at night. > To better utilize the resources while keep the existing SLA of the cluster, > Lease/Reclaim Extension to Yarn is introduced. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5455) LinuxContainerExecutor needs Javadocs
[ https://issues.apache.org/jira/browse/YARN-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400972#comment-15400972 ] Allen Wittenauer edited comment on YARN-5455 at 7/31/16 5:53 AM: - bq. This class provides execution for Linux, trading broader platform support for the ability to take advantage of the capabilities of Linux. This isn't actually correct. Other than cgroups (which are optional), there's nothing here actually Linux specific. It runs and has run fine on non-Linux other than Windows for years and years. was (Author: aw): bq. This class provides execution for Linux, trading broader platform support for the ability to take advantage of the capabilities of Linux. This isn't actually correct. Other than cgroups, there's nothing here actually Linux specific. It runs and has run fine on non-Linux other than Windows for years and years. > LinuxContainerExecutor needs Javadocs > - > > Key: YARN-5455 > URL: https://issues.apache.org/jira/browse/YARN-5455 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.8.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > Attachments: YARN-5455.001.patch > > > 'Nuff said. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5455) LinuxContainerExecutor needs Javadocs
[ https://issues.apache.org/jira/browse/YARN-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400972#comment-15400972 ] Allen Wittenauer commented on YARN-5455: bq. This class provides execution for Linux, trading broader platform support for the ability to take advantage of the capabilities of Linux. This isn't actually correct. Other than cgroups, there's nothing here actually Linux specific. It runs and has run fine on non-Linux other than Windows for years and years. > LinuxContainerExecutor needs Javadocs > - > > Key: YARN-5455 > URL: https://issues.apache.org/jira/browse/YARN-5455 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.8.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > Attachments: YARN-5455.001.patch > > > 'Nuff said. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5394) Remove bind-mount /etc/passwd to Docker Container
[ https://issues.apache.org/jira/browse/YARN-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400963#comment-15400963 ] Zhankun Tang commented on YARN-5394: Changed the patch name. Set 2.8.0 to "Affects version" and "Fix version" when submitting failed due to unrelated unit test as mentioned above. And Leave them empty also does not works. The patch is generated based on branch-2.8. [~sidharta-s], Could you please tell me what I am missing here? Thanks > Remove bind-mount /etc/passwd to Docker Container > - > > Key: YARN-5394 > URL: https://issues.apache.org/jira/browse/YARN-5394 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Zhankun Tang >Assignee: Zhankun Tang > Attachments: YARN-5394.002.patch > > > Current LCE (DockerLinuxContainerRuntime) is mounting /etc/passwd to the > container. And it seems uses wrong file name "/etc/password" for container. > {panel} > .addMountLocation("/etc/passwd", "/etc/password:ro"); > {panel} > The biggest issue of bind-mount /etc/passwd is that it overrides the users > defined in Docker image which is not expected. Remove it won't affect > existing use cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking
[ https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Zhi updated YARN-4676: - Comment: was deleted (was: I barely resolved all merge conflicts and compiler errors when reapply the change on top of latest Hadoop trunk, and generated YARN-4676.017.patch. It usually took at least days to build a new functional EMR AMI due to variety of breaking changes inside the trunk, after which I can test in real cluster.) > Automatic and Asynchronous Decommissioning Nodes Status Tracking > > > Key: YARN-4676 > URL: https://issues.apache.org/jira/browse/YARN-4676 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Daniel Zhi >Assignee: Daniel Zhi > Labels: features > Attachments: GracefulDecommissionYarnNode.pdf, > GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, YARN-4676.005.patch, > YARN-4676.006.patch, YARN-4676.007.patch, YARN-4676.008.patch, > YARN-4676.009.patch, YARN-4676.010.patch, YARN-4676.011.patch, > YARN-4676.012.patch, YARN-4676.013.patch, YARN-4676.014.patch, > YARN-4676.015.patch, YARN-4676.016.patch, YARN-4676.017.patch > > > YARN-4676 implements an automatic, asynchronous and flexible mechanism to > graceful decommission > YARN nodes. After user issues the refreshNodes request, ResourceManager > automatically evaluates > status of all affected nodes to kicks out decommission or recommission > actions. RM asynchronously > tracks container and application status related to DECOMMISSIONING nodes to > decommission the > nodes immediately after there are ready to be decommissioned. Decommissioning > timeout at individual > nodes granularity is supported and could be dynamically updated. The > mechanism naturally supports multiple > independent graceful decommissioning “sessions” where each one involves > different sets of nodes with > different timeout settings. Such support is ideal and necessary for graceful > decommission request issued > by external cluster management software instead of human. > DecommissioningNodeWatcher inside ResourceTrackingService tracks > DECOMMISSIONING nodes status automatically and asynchronously after > client/admin made the graceful decommission request. It tracks > DECOMMISSIONING nodes status to decide when, after all running containers on > the node have completed, will be transitioned into DECOMMISSIONED state. > NodesListManager detect and handle include and exclude list changes to kick > out decommission or recommission as necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5394) Remove bind-mount /etc/passwd to Docker Container
[ https://issues.apache.org/jira/browse/YARN-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400957#comment-15400957 ] Hadoop QA commented on YARN-5394: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s {color} | {color:red} YARN-5394 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12821193/YARN-5394.002.patch | | JIRA Issue | YARN-5394 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/12577/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > Remove bind-mount /etc/passwd to Docker Container > - > > Key: YARN-5394 > URL: https://issues.apache.org/jira/browse/YARN-5394 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Zhankun Tang >Assignee: Zhankun Tang > Attachments: YARN-5394.002.patch > > > Current LCE (DockerLinuxContainerRuntime) is mounting /etc/passwd to the > container. And it seems uses wrong file name "/etc/password" for container. > {panel} > .addMountLocation("/etc/passwd", "/etc/password:ro"); > {panel} > The biggest issue of bind-mount /etc/passwd is that it overrides the users > defined in Docker image which is not expected. Remove it won't affect > existing use cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5394) Remove bind-mount /etc/passwd to Docker Container
[ https://issues.apache.org/jira/browse/YARN-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-5394: --- Attachment: (was: YARN-5394-branch-2.8.002.patch) > Remove bind-mount /etc/passwd to Docker Container > - > > Key: YARN-5394 > URL: https://issues.apache.org/jira/browse/YARN-5394 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Zhankun Tang >Assignee: Zhankun Tang > > Current LCE (DockerLinuxContainerRuntime) is mounting /etc/passwd to the > container. And it seems uses wrong file name "/etc/password" for container. > {panel} > .addMountLocation("/etc/passwd", "/etc/password:ro"); > {panel} > The biggest issue of bind-mount /etc/passwd is that it overrides the users > defined in Docker image which is not expected. Remove it won't affect > existing use cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5394) Remove bind-mount /etc/passwd to Docker Container
[ https://issues.apache.org/jira/browse/YARN-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-5394: --- Attachment: (was: YARN-5394-branch-2.8.001.patch) > Remove bind-mount /etc/passwd to Docker Container > - > > Key: YARN-5394 > URL: https://issues.apache.org/jira/browse/YARN-5394 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Zhankun Tang >Assignee: Zhankun Tang > > Current LCE (DockerLinuxContainerRuntime) is mounting /etc/passwd to the > container. And it seems uses wrong file name "/etc/password" for container. > {panel} > .addMountLocation("/etc/passwd", "/etc/password:ro"); > {panel} > The biggest issue of bind-mount /etc/passwd is that it overrides the users > defined in Docker image which is not expected. Remove it won't affect > existing use cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5394) Remove bind-mount /etc/passwd to Docker Container
[ https://issues.apache.org/jira/browse/YARN-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-5394: --- Attachment: YARN-5394.002.patch > Remove bind-mount /etc/passwd to Docker Container > - > > Key: YARN-5394 > URL: https://issues.apache.org/jira/browse/YARN-5394 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Zhankun Tang >Assignee: Zhankun Tang > Attachments: YARN-5394.002.patch > > > Current LCE (DockerLinuxContainerRuntime) is mounting /etc/passwd to the > container. And it seems uses wrong file name "/etc/password" for container. > {panel} > .addMountLocation("/etc/passwd", "/etc/password:ro"); > {panel} > The biggest issue of bind-mount /etc/passwd is that it overrides the users > defined in Docker image which is not expected. Remove it won't affect > existing use cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking
[ https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Zhi updated YARN-4676: - Attachment: YARN-4676.017.patch I reapply the code on top of latest trunk and barely resolved all conflicts and compiler errors, and produced YARN-4676.017.patch. It usually took at least days to build EMR AMI after switching to newer trunk due to variety of breaking changes, after which I can test in real cluster. > Automatic and Asynchronous Decommissioning Nodes Status Tracking > > > Key: YARN-4676 > URL: https://issues.apache.org/jira/browse/YARN-4676 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Daniel Zhi >Assignee: Daniel Zhi > Labels: features > Attachments: GracefulDecommissionYarnNode.pdf, > GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, YARN-4676.005.patch, > YARN-4676.006.patch, YARN-4676.007.patch, YARN-4676.008.patch, > YARN-4676.009.patch, YARN-4676.010.patch, YARN-4676.011.patch, > YARN-4676.012.patch, YARN-4676.013.patch, YARN-4676.014.patch, > YARN-4676.015.patch, YARN-4676.016.patch, YARN-4676.017.patch > > > YARN-4676 implements an automatic, asynchronous and flexible mechanism to > graceful decommission > YARN nodes. After user issues the refreshNodes request, ResourceManager > automatically evaluates > status of all affected nodes to kicks out decommission or recommission > actions. RM asynchronously > tracks container and application status related to DECOMMISSIONING nodes to > decommission the > nodes immediately after there are ready to be decommissioned. Decommissioning > timeout at individual > nodes granularity is supported and could be dynamically updated. The > mechanism naturally supports multiple > independent graceful decommissioning “sessions” where each one involves > different sets of nodes with > different timeout settings. Such support is ideal and necessary for graceful > decommission request issued > by external cluster management software instead of human. > DecommissioningNodeWatcher inside ResourceTrackingService tracks > DECOMMISSIONING nodes status automatically and asynchronously after > client/admin made the graceful decommission request. It tracks > DECOMMISSIONING nodes status to decide when, after all running containers on > the node have completed, will be transitioned into DECOMMISSIONED state. > NodesListManager detect and handle include and exclude list changes to kick > out decommission or recommission as necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking
[ https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400947#comment-15400947 ] Daniel Zhi commented on YARN-4676: -- Transcript of recently design discussions through email: From: Junping Du [mailto:j...@hortonworks.com] Sent: Friday, July 29, 2016 10:40 AM To: Robert Kanter; Zhi, Daniel Cc: Karthik Kambatla; Ming Ma Subject: Re: YARN-4676 discussion The plan sounds reasonable to me. Thanks Robert to coordinate on this. I just commit YARN-5434 to branch-2.8. So Daniel, please start your rebase work in your earliest convenient time - I will try the functionality and review the code also. About previous questions, it is better to have such a discussions (with technical details) on the JIRA so we won't lose track in future and it is helpful to enhance your work's visibility as more audiences sooner or later to watch this. From: Robert Kanter Sent: Friday, July 29, 2016 2:20 AM To: Zhi, Daniel Cc: Karthik Kambatla; Junping Du; Ming Ma Subject: Re: YARN-4676 discussion Sorry, I used milliseconds instead of seconds, but it sounds like my example was clear enough despite that. In the interest of moving things along, let's try to get YARN-4676 committed as soon as possible, and then take care of the remaining items as followup JIRAs. It sounds like some of these require more discussion, and it might be easier this way instead of holding up YARN-4676 until everything is perfect. So let's do this: 1. Junping will commit YARN-5434 to add the -client|server arguments tomorrow (Friday); for 2.8+ 2. Daniel will rebase YARN-4676 on top of the latest, and make any minor changes due to YARN-5434; for 2.9+ o Junping and I can do a final review and hopefully commit early next week 3. We'll open followup JIRAs for the following items where we can discuss/implement more: 1. Abstract out the host file format to allow the txt format, XML format, and JSON format 2. Figure out if we want to change the behavior of subsequent parallel calls to gracefully decom nodes Does that sound like a good plan? Did I miss anything? On Tue, Jul 26, 2016 at 10:50 PM, Zhi, Daniel wrote: Karthik: the timeout currently is always in unit of second, both the command line arg and internal variables. Robert’s example should be “-refreshNodes –t 30”. Robert: the timeout overwrite behavior you observed matches code logic inside NodesListManager.java. The first “-refreshNodes -t 120” sets 120 on A. The second “-refreshNodes –t 30” sets 30 on both A and B. In this case, there is no timeout in the exclude file so timeoutToUse is the timeout (120 and 30), A’s timeout being updated to 30 by line 287 + 311. To some extent, it is by design as both refreshes refer to the set of hosts inside exclude host file. Specifically the second refresh is about both A and B instead of B only, despite your intention is about B. In your case, if the timeout was specified inside the exclude host file, it will work as you expected. Although the code could be changed to not update existing node timeout unless such timeout is from the host file (per node overwrite), I couldn’t tell quickly whether it is better given possible other side effect. This is something we can evaluate more. 252 private void handleExcludeNodeList(boolean graceful, Integer timeout) { 276 // Use per node timeout if exist otherwise the request timeout. 277 Integer timeoutToUse = (timeouts.get(n.getHostName()) != null)? 278 timeouts.get(n.getHostName()) : timeout; 283 } else if (s == NodeState.DECOMMISSIONING && 284 !Objects.equals(n.getDecommissioningTimeout(), 285 timeoutToUse)) { 286 LOG.info("Update " + nodeStr + " timeout to be " + timeoutToUse); 287 nodesToDecom.add(n); 311 e = new RMNodeDecommissioningEvent(n.getNodeID(), timeoutToUse); From: Karthik Kambatla [mailto:ka...@cloudera.com] Sent: Tuesday, July 26, 2016 6:08 PM To: Robert Kanter; Zhi, Daniel Cc: Junping Du; Ming Ma Subject: Re: YARN-4676 discussion Related, but orthogonal. Do we see value in using milliseconds for the timeout? Should it be seconds? On Tue, Jul 26, 2016 at 5:55 PM Robert Kanter wrote: I spoke with Junping and Karthik earlier today. We agreed that to simplify support for client-side tracking and server-side tracking, we should add a required -client|server argument. We'll add this in 2.8 to be compatible with the server-side tracking, when it's ready (probably 2.9); in the meantime, the -server flag will throw an Exception. This will help simplify things a little in that client-side tracking and server-side tracking are now mutually exclusive. For example yarn rmadmin -refreshNodes -g 1000 -client will do client-side track
[jira] [Updated] (YARN-5455) LinuxContainerExecutor needs Javadocs
[ https://issues.apache.org/jira/browse/YARN-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated YARN-5455: --- Attachment: YARN-5455.001.patch Look, ma, Javadocs! > LinuxContainerExecutor needs Javadocs > - > > Key: YARN-5455 > URL: https://issues.apache.org/jira/browse/YARN-5455 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.8.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > Attachments: YARN-5455.001.patch > > > 'Nuff said. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5455) LinuxContainerExecutor needs Javadocs
Daniel Templeton created YARN-5455: -- Summary: LinuxContainerExecutor needs Javadocs Key: YARN-5455 URL: https://issues.apache.org/jira/browse/YARN-5455 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.8.0 Reporter: Daniel Templeton Assignee: Daniel Templeton 'Nuff said. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5113) Refactoring and other clean-up for distributed scheduling
[ https://issues.apache.org/jira/browse/YARN-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400915#comment-15400915 ] Hadoop QA commented on YARN-5113: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 7 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 21s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 41s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 19s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 42s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 48s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 33s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 54s {color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 14s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 40s {color} | {color:red} hadoop-yarn-project/hadoop-yarn: The patch generated 7 new + 363 unchanged - 47 fixed = 370 total (was 410) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s {color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s {color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s {color} | {color:green} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common generated 0 new + 159 unchanged - 4 fixed = 159 total (was 163) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s {color} | {color:green} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager generated 0 new + 248 unchanged - 7 fixed = 248 total (was 255) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s {color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s {color} | {color:green} hadoop-yarn-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s {color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 14s {color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s {color} | {color:green}
[jira] [Updated] (YARN-5113) Refactoring and other clean-up for distributed scheduling
[ https://issues.apache.org/jira/browse/YARN-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-5113: - Attachment: YARN-5113.009.patch Thanks for the comments, [~asuresh]. Uploading new patch. I added all the distributed/NM-queuing related parameters to yarn-default.xml and got rid of skipping those variables in {{TestYarnConfigurationFields}}. I also added/updated the comments for the Request and Reponse classes, and moved the finish method after the allocate one in the distributed_scheduling_am_protocol. As Arun points out, both "enable" and "enabled" are used in the {{YarnConfiguration}} for various parameters. I checked again and kept the "enabled" for distributed scheduling, because it is used more times than the "enable" (~40 and 10 times, respectively). > Refactoring and other clean-up for distributed scheduling > - > > Key: YARN-5113 > URL: https://issues.apache.org/jira/browse/YARN-5113 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Konstantinos Karanasos > Attachments: YARN-5113.001.patch, YARN-5113.002.patch, > YARN-5113.003.patch, YARN-5113.004.patch, YARN-5113.005.patch, > YARN-5113.006.patch, YARN-5113.007.patch, YARN-5113.008.patch, > YARN-5113.009.patch > > > This JIRA focuses on the refactoring of classes related to Distributed > Scheduling -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5454) Various places have a hard-coded location for bash
Allen Wittenauer created YARN-5454: -- Summary: Various places have a hard-coded location for bash Key: YARN-5454 URL: https://issues.apache.org/jira/browse/YARN-5454 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0-alpha1 Reporter: Allen Wittenauer Lots of places in nodemanager have the location of bash hard-coded to /bin/bash. This is not portable. bash should either be found via /usr/bin/env or have no path at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5219) When an export var command fails in launch_container.sh, the full container launch should fail
[ https://issues.apache.org/jira/browse/YARN-5219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400848#comment-15400848 ] Allen Wittenauer commented on YARN-5219: bq. Any shell variable substitution with empty value can be considered as error Why? It's perfectly valid to have a env var that exists but is empty. It seems like the real fix here is to use set -e so that any error in the generated launch_container.sh causes the script to exit and return a bad status code rather than trying to predict what may or may not be valid env var definitions. > When an export var command fails in launch_container.sh, the full container > launch should fail > -- > > Key: YARN-5219 > URL: https://issues.apache.org/jira/browse/YARN-5219 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Sunil G > Attachments: YARN-5219-branch-2.001.patch, YARN-5219.001.patch, > YARN-5219.003.patch > > > Today, a container fails if certain files fail to localize. However, if > certain env vars fail to get setup properly either due to bugs in the yarn > application or misconfiguration, the actual process launch still gets > triggered. This results in either confusing error messages if the process > fails to launch or worse yet the process launches but then starts behaving > wrongly if the env var is used to control some behavioral aspects. > In this scenario, the issue was reproduced by trying to do export > abc="$\{foo.bar}" which is invalid as var names cannot contain "." in bash. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5121) fix some container-executor portability issues
[ https://issues.apache.org/jira/browse/YARN-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400741#comment-15400741 ] Allen Wittenauer commented on YARN-5121: Thanks for the reviews and for helping to get this in guys! > fix some container-executor portability issues > -- > > Key: YARN-5121 > URL: https://issues.apache.org/jira/browse/YARN-5121 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0-alpha1 >Reporter: Allen Wittenauer >Assignee: Allen Wittenauer >Priority: Blocker > Fix For: 3.0.0-alpha2 > > Attachments: YARN-5121.00.patch, YARN-5121.01.patch, > YARN-5121.02.patch, YARN-5121.03.patch, YARN-5121.04.patch, > YARN-5121.06.patch, YARN-5121.07.patch, YARN-5121.08.patch > > > container-executor has some issues that are preventing it from even compiling > on the OS X jenkins instance. Let's fix those. While we're there, let's > also try to take care of some of the other portability problems that have > crept in over the years, since it used to work great on Solaris but now > doesn't. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5121) fix some container-executor portability issues
[ https://issues.apache.org/jira/browse/YARN-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400722#comment-15400722 ] Hudson commented on YARN-5121: -- SUCCESS: Integrated in Hadoop-trunk-Commit #10182 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10182/]) YARN-5121. fix some container-executor portability issues. Contributed (cnauroth: rev ef501b1a0b4c34a2cc43eb082d1c2364684cd7f1) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/config.h.cmake * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/compat/fchmodat.h * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/compat/unlinkat.h * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/compat/fdopendir.h * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/get_executable.c * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/CMakeLists.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/compat/openat.h * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/configuration.c * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/configuration.h * LICENSE.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/compat/fstatat.h > fix some container-executor portability issues > -- > > Key: YARN-5121 > URL: https://issues.apache.org/jira/browse/YARN-5121 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0-alpha1 >Reporter: Allen Wittenauer >Assignee: Allen Wittenauer >Priority: Blocker > Fix For: 3.0.0-alpha2 > > Attachments: YARN-5121.00.patch, YARN-5121.01.patch, > YARN-5121.02.patch, YARN-5121.03.patch, YARN-5121.04.patch, > YARN-5121.06.patch, YARN-5121.07.patch, YARN-5121.08.patch > > > container-executor has some issues that are preventing it from even compiling > on the OS X jenkins instance. Let's fix those. While we're there, let's > also try to take care of some of the other portability problems that have > crept in over the years, since it used to work great on Solaris but now > doesn't. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5453) FairScheduler#update may skip update demand resource of child queue/app if current demand reached maxResource
sandflee created YARN-5453: -- Summary: FairScheduler#update may skip update demand resource of child queue/app if current demand reached maxResource Key: YARN-5453 URL: https://issues.apache.org/jira/browse/YARN-5453 Project: Hadoop YARN Issue Type: Bug Reporter: sandflee Assignee: sandflee {code} demand = Resources.createResource(0); for (FSQueue childQueue : childQueues) { childQueue.updateDemand(); Resource toAdd = childQueue.getDemand(); demand = Resources.add(demand, toAdd); demand = Resources.componentwiseMin(demand, maxRes); if (Resources.equals(demand, maxRes)) { break; } } {code} if one singe queue's demand resource exceed maxRes, the other queue's demand resource will not update. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5394) Remove bind-mount /etc/passwd to Docker Container
[ https://issues.apache.org/jira/browse/YARN-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400698#comment-15400698 ] Zhankun Tang commented on YARN-5394: The failed unit test "TestNMProxy" seems unrelated to this patch. I submited it with both "Affects version" and "target version" set to 2.8.0. Is this the reason causes "TestNMProxy" failure? *Or maybe I need to test it against trunk*? I checked the code of the failed test case "testNMProxyRPCRetry" in branch-2.8. It seems that it should failed since no exception contains the expected string "Failed on local exception: java.net.SocketException". And in trunk, the same test case just assert the class type rather than string comparison. {code:title=TestNMProxy.java(branch-2.8)} try { proxy.startContainers(allRequests); Assert.fail("should get socket exception"); } catch (IOException e) { // socket exception should be thrown immediately, without RPC retries. Assert.assertTrue(e.toString(). contains("Failed on local exception: java.net.SocketException")); } {code} {code:title=TestNMProxy.java(trunk)} try { proxy.startContainers(allRequests); Assert.fail("should get socket exception"); } catch (IOException e) { // socket exception should be thrown immediately, without RPC retries. Assert.assertTrue(e instanceof java.net.SocketException); } {code} > Remove bind-mount /etc/passwd to Docker Container > - > > Key: YARN-5394 > URL: https://issues.apache.org/jira/browse/YARN-5394 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 2.8.0 >Reporter: Zhankun Tang >Assignee: Zhankun Tang > Labels: patch > Attachments: YARN-5394-branch-2.8.001.patch, > YARN-5394-branch-2.8.002.patch > > > Current LCE (DockerLinuxContainerRuntime) is mounting /etc/passwd to the > container. And it seems uses wrong file name "/etc/password" for container. > {panel} > .addMountLocation("/etc/passwd", "/etc/password:ro"); > {panel} > The biggest issue of bind-mount /etc/passwd is that it overrides the users > defined in Docker image which is not expected. Remove it won't affect > existing use cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5394) Remove bind-mount /etc/passwd to Docker Container
[ https://issues.apache.org/jira/browse/YARN-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400679#comment-15400679 ] Hadoop QA commented on YARN-5394: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 45s {color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} branch-2.8 passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} branch-2.8 passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s {color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 55s {color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s {color} | {color:green} branch-2.8 passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s {color} | {color:green} branch-2.8 passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s {color} | {color:green} the patch passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 1s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s {color} | {color:green} the patch passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s {color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 5s {color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK v1.8.0_101. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 38s {color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK v1.7.0_101. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 34m 10s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_101 Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.TestNMProxy | | JDK v1.7.0_101 Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.TestNMProxy | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:5af2af1 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12820904/YARN-5394-branch-2.8.002.patch | | JIRA Issue | YARN-5394 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux ce39d9dbeee8 3.13.0
[jira] [Commented] (YARN-5394) Remove bind-mount /etc/passwd to Docker Container
[ https://issues.apache.org/jira/browse/YARN-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400669#comment-15400669 ] Zhankun Tang commented on YARN-5394: Yes. Sure. Have it submitted. > Remove bind-mount /etc/passwd to Docker Container > - > > Key: YARN-5394 > URL: https://issues.apache.org/jira/browse/YARN-5394 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 2.8.0 >Reporter: Zhankun Tang >Assignee: Zhankun Tang > Labels: patch > Attachments: YARN-5394-branch-2.8.001.patch, > YARN-5394-branch-2.8.002.patch > > > Current LCE (DockerLinuxContainerRuntime) is mounting /etc/passwd to the > container. And it seems uses wrong file name "/etc/password" for container. > {panel} > .addMountLocation("/etc/passwd", "/etc/password:ro"); > {panel} > The biggest issue of bind-mount /etc/passwd is that it overrides the users > defined in Docker image which is not expected. Remove it won't affect > existing use cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5140) NM usercache fill up with burst of jobs leading to rapid temp IO FS fill up and potentially NM outage
[ https://issues.apache.org/jira/browse/YARN-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oleksandr Kalinin updated YARN-5140: Priority: Minor (was: Major) > NM usercache fill up with burst of jobs leading to rapid temp IO FS fill up > and potentially NM outage > - > > Key: YARN-5140 > URL: https://issues.apache.org/jira/browse/YARN-5140 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 > Environment: Linux RHEL 6.7, Hadoop 2.7.0 >Reporter: Oleksandr Kalinin >Priority: Minor > > A burst or rapid rate of submitted jobs with substantial NM usercache > resource localization footprint may lead to rapid fill up of the NM local > temporary IO FS (/tmp by default) with negative consequences in terms of > stability. > The core issue seems to be the fact that NM continues to localize the > resources beyond the maximum local cache size > (yarn.nodemanager.localizer.cache.target-size-mb , default 10G). Since > maximum local cache size is effectively not taken into account when > localizing new resources (note that default cache cleanup interval is 10 min > controlled by yarn.nodemanager.localizer.cache.cleanup.interval-ms), this > basically leads to sort of self-destruction scenario : once /tmp FS > utilization reaches the threshold of 90%, NM will automatically de-register > from RM, effectively leading to NM outage. > This issue may offline many NMs simultaneously at the same time and thus is > quite critical in terms of platform stability. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5140) NM usercache fill up with burst of jobs leading to rapid temp IO FS fill up and potentially NM outage
[ https://issues.apache.org/jira/browse/YARN-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400654#comment-15400654 ] Oleksandr Kalinin commented on YARN-5140: - Workaround option to this issue is explicit yarn.nodemanager.local-dirs configuration pointing to DFS disks. Default parameter value '${hadoop.tmp.dir}/nm-local-dir' will imply usage of local FS on system disk in most installations. Besides of FS fill up risk explained in the description, this is not scalable and performs poorly for any heavy localization as well as some particular workload phases like Spark on YARN shuffle. Perhaps those drawbacks of using single local FS directory should be better documented in yarn.nodemanager.local-dirs parameter description. > NM usercache fill up with burst of jobs leading to rapid temp IO FS fill up > and potentially NM outage > - > > Key: YARN-5140 > URL: https://issues.apache.org/jira/browse/YARN-5140 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 > Environment: Linux RHEL 6.7, Hadoop 2.7.0 >Reporter: Oleksandr Kalinin > > A burst or rapid rate of submitted jobs with substantial NM usercache > resource localization footprint may lead to rapid fill up of the NM local > temporary IO FS (/tmp by default) with negative consequences in terms of > stability. > The core issue seems to be the fact that NM continues to localize the > resources beyond the maximum local cache size > (yarn.nodemanager.localizer.cache.target-size-mb , default 10G). Since > maximum local cache size is effectively not taken into account when > localizing new resources (note that default cache cleanup interval is 10 min > controlled by yarn.nodemanager.localizer.cache.cleanup.interval-ms), this > basically leads to sort of self-destruction scenario : once /tmp FS > utilization reaches the threshold of 90%, NM will automatically de-register > from RM, effectively leading to NM outage. > This issue may offline many NMs simultaneously at the same time and thus is > quite critical in terms of platform stability. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4888) Changes in RM container allocation for identifying resource-requests explicitly
[ https://issues.apache.org/jira/browse/YARN-4888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400563#comment-15400563 ] Arun Suresh commented on YARN-4888: --- Lets also add a testcase (possibly a TestSchedulerRequestKey) to verify the sort order of the SchedulerKeys.. > Changes in RM container allocation for identifying resource-requests > explicitly > --- > > Key: YARN-4888 > URL: https://issues.apache.org/jira/browse/YARN-4888 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > Attachments: YARN-4888-WIP.patch, YARN-4888-v0.patch, > YARN-4888-v2.patch, YARN-4888-v3.patch, YARN-4888-v4.patch, > YARN-4888.001.patch > > > YARN-4879 puts forward the notion of identifying allocate requests > explicitly. This JIRA is to track the changes in RM app scheduling data > structures to accomplish it. Please refer to the design doc in the parent > JIRA for details. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org