[jira] [Commented] (YARN-147) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534645#comment-13534645 ] Hadoop QA commented on YARN-147: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12555695/YARN-147-v8.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.TestLinuxContainerExecutorWithMocks {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/230//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/230//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/230//console This message is automatically generated. Add support for CPU isolation/monitoring of containers -- Key: YARN-147 URL: https://issues.apache.org/jira/browse/YARN-147 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Andrew Ferguson Fix For: 2.0.3-alpha Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch, YARN-147-v4.patch, YARN-147-v5.patch, YARN-147-v6.patch, YARN-147-v8.patch, YARN-3.patch This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not show the SUBMIT PATCH button. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-147) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483255#comment-13483255 ] Arun C Murthy commented on YARN-147: Cancelling patch while comments are addressed, particularly the one Sid raised - we can't break LCE. Also, we need to make sure this continues to work on RHEL5/CentOS5 which doesn't have cgroups. One more thing - can we please do reviews/discussions on YARN-3 to ensure we keep track in one place? Thanks. Add support for CPU isolation/monitoring of containers -- Key: YARN-147 URL: https://issues.apache.org/jira/browse/YARN-147 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Andrew Ferguson Fix For: 2.0.3-alpha Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch, YARN-147-v4.patch, YARN-147-v5.patch, YARN-3.patch This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not show the SUBMIT PATCH button. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-147) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483268#comment-13483268 ] Andrew Ferguson commented on YARN-147: -- hi [~acmurthy], I've started posting replies on YARN-3 instead. the LCE bug is fixed and I'll post a new patch after addressing [~vinodkv]'s comments. thanks! Add support for CPU isolation/monitoring of containers -- Key: YARN-147 URL: https://issues.apache.org/jira/browse/YARN-147 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Andrew Ferguson Fix For: 2.0.3-alpha Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch, YARN-147-v4.patch, YARN-147-v5.patch, YARN-3.patch This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not show the SUBMIT PATCH button. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-147) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482751#comment-13482751 ] Vinod Kumar Vavilapalli commented on YARN-147: -- Got lost between YARN-3 and YARN-147 :) Very nice to have patch, I am willing to get it in ASAP given the amount of time it's been around. Some comments below. - yarn.nodemanager.linux-container-executor.cgroups.mount has different defaults in code and in yarn-default.xml - If the configs can be done away with (see below), ignore this comment. The descriptions for all the new configs in yarn-default.xml heavily reference code. We should simplify them to not address code and instead make them understandable by users and cross reference other related parameters. - {code}// Based on testing, ApplicationMaster executables don't terminate until // a little after the container appears to have finished. Therefore, we // wait a short bit for the cgroup to become empty before deleting it. {code} Can you explain this? Is this sleep necessary. Depending on its importance, we'll need to fix the following Id check, AMs don't always have ID equaling one. - container-executor.c: If a mount-point is already mounted, mount gives a EBUSY error, mount_cgroup() will need to be fixed to support remounts (for e.g. on NM restarts). We could unmount cgroup fs on shutdown but that isn't always guaranteed. - Please update if you have tested it on a secure setup with LCE enabled with and without cgroups. The following are already raised by others in some way, but I don't see them fixed in the latest patch. Unless I am missing something: - Not sure of the benefit of configurable yarn.nodemanager.linux-container-executor.cgroups.mount-path. Couldn't NM just always mount to a path that it creates and owns? Similar comment for the hierarchy-prefix. - CgroupsLCEResourcesHandler is swallowing exceptions and errors in multiple places - updateCgroup() and createCgroup(). In the later, if cgroups are enabled, and we can't create the file, it is a critical error? One overarching improvement worth pursing immediately, either now or in follow up tickets: - Make ResourcesHandler top level. I'd like to merge the ContainersMonitor functionality with this so as to monitor/enforce memory limits also. ContainersMinotor is top-level, we should make ResourcesHandler also top-level so that other platforms don't need to create this type-hierarchy all over again when they wish to implement some or all of this functionality. Add support for CPU isolation/monitoring of containers -- Key: YARN-147 URL: https://issues.apache.org/jira/browse/YARN-147 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Andrew Ferguson Fix For: 2.0.3-alpha Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch, YARN-147-v4.patch, YARN-147-v5.patch, YARN-3.patch This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not show the SUBMIT PATCH button. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-147) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480319#comment-13480319 ] Colin Patrick McCabe commented on YARN-147: --- {{get_kv_key}}, {{get_kv_value}}: thanks for implementing the API I suggested. It looks a lot better. You might consider using {{strspn}} to simplify the code a bit (it returns a number of bytes, rather than a pointer, like {{index}} does.) However, this is up to you (it's fine as-is). It's more traditional to put the doxygen into the header file than the {{.c}} file. Header files tend to define an interface, so that's where I go to find out how that interface should be used. {{mount_cgroup}}: it looks like if {{result = -1}}, the call to {{fprintf}} will print out {{errno}}, which has not been set. You could either set {{errno}} instead of {{result}} in the preceding code, or restructure the code a bit. Add support for CPU isolation/monitoring of containers -- Key: YARN-147 URL: https://issues.apache.org/jira/browse/YARN-147 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Andrew Ferguson Fix For: 2.0.3-alpha Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch, YARN-147-v4.patch, YARN-3.patch This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not show the SUBMIT PATCH button. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-147) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480467#comment-13480467 ] Colin Patrick McCabe commented on YARN-147: --- Looks good to me. Add support for CPU isolation/monitoring of containers -- Key: YARN-147 URL: https://issues.apache.org/jira/browse/YARN-147 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Andrew Ferguson Fix For: 2.0.3-alpha Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch, YARN-147-v4.patch, YARN-147-v5.patch, YARN-3.patch This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not show the SUBMIT PATCH button. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-147) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479387#comment-13479387 ] Andrew Ferguson commented on YARN-147: -- hi Colin, thanks for looking at the native code. since the changes were pretty extensive, would you mind taking a careful look again? if it's easier for you, the incremental changes can be seen here: https://github.com/adferguson/hadoop-common/commits/adf-yarn-147 I hope I've faithfully implemented the new key-value API you suggested -- let me know if that's not the case. If the mount fails, I let the exception bubble all the way up to stop the NodeManager, as Tucu suggested before about a different error. The one thing I did not do is change the open / write / close methods to fopen / fprintf / fclose, as the rest of the native code does not use those methods. Which would you prefer to see: adjust my patch to use fopen, etc., or fix my use of open, etc.? Yes, I totally agree that it would be better if main.c used getopt_long; it definitely smells like another JIRA to me. :-) thanks! Andrew Add support for CPU isolation/monitoring of containers -- Key: YARN-147 URL: https://issues.apache.org/jira/browse/YARN-147 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Andrew Ferguson Fix For: 2.0.3-alpha Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch, YARN-3.patch This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not show the SUBMIT PATCH button. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-147) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477535#comment-13477535 ] Hadoop QA commented on YARN-147: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12549422/YARN-147-v2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/93//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/93//console This message is automatically generated. Add support for CPU isolation/monitoring of containers -- Key: YARN-147 URL: https://issues.apache.org/jira/browse/YARN-147 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Andrew Ferguson Fix For: 2.0.3-alpha Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-3.patch This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not show the SUBMIT PATCH button. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-147) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476220#comment-13476220 ] Hadoop QA commented on YARN-147: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12549094/YARN-147-v1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/89//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/89//console This message is automatically generated. Add support for CPU isolation/monitoring of containers -- Key: YARN-147 URL: https://issues.apache.org/jira/browse/YARN-147 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Andrew Ferguson Fix For: 2.0.3-alpha Attachments: YARN-147-v1.patch, YARN-3.patch This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not show the SUBMIT PATCH button. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-147) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476268#comment-13476268 ] Alejandro Abdelnur commented on YARN-147: - Andrew, thanks for updating the patch. Mostly minor stuff: * CgroupsLCEResourcesHandler has an unused import for FileNotFoundException * CgroupsLCEResourcesHandler, on the double '//', on the init() method, trim the '/' is present at the beginning or end of the cgroupPrefix * CgroupsLCEResourcesHandler, parseMtab, no need have a local var for the fReader, the FileReader creation could be done directly in the BufferedReader constructor. * CgroupsLCEResourcesHandler, parseMtab, the last exception should print MTAB_FILE instead the BufferReader instance. * CgroupsLCEResourcesHandler, on parseMtab error I think we should thrown an exception to halt things. My reasoning is that if I've configured things to use cgroups, I'd expect them to be working as opposed to have a warning message lost in the logs. This would help identify misconfigurations. Add support for CPU isolation/monitoring of containers -- Key: YARN-147 URL: https://issues.apache.org/jira/browse/YARN-147 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Andrew Ferguson Fix For: 2.0.3-alpha Attachments: YARN-147-v1.patch, YARN-3.patch This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not show the SUBMIT PATCH button. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-147) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475909#comment-13475909 ] Andrew Ferguson commented on YARN-147: -- hi Tucu, thanks very much for opening this new jira and reviewing the patch. I've updated a new version which addresses most of your comments. answers to the questions in your review: .bq cgroupMountPath, if there is no default we should fail if not set, can't we have a sensible default? I've added a check to fail if not set. as far as I can tell, there isn't a single default path for cgroups -- some distributions use /sys/fs/cgroup, some use /cgroup, others, /cgroups. I've even seen /mnt/cgroup (Debian perhaps?); these also vary across releases of the same distro. :-( .bq default value for cgroupPrefix has '/', here will produce a '//' in the path yes, I made that choice deliberately. I wanted to convey that cgroupPrefix can be a path (which is why I kept the '/') and when I use it, I also added a '/' in case the user did not put a '/' at the right place in the prefix. my understanding is that on Unix, '//' in a path is interpreted as '/', no? .bq Nf the filereader cannot be open/read, is this acceptable or should stop execution by throwing exception? eh, we could go either way here, but I think it's reasonable to not throw the exception. if the file can't be read, then the map from cgroup controller to path isn't built, and we already have existing checks which skip controllers which can't be found in the path (say, if the file can be read correctly, but the CPU controller isn't mounted). ok, great. I'm going to mark this as patch available and see if the findbugs warning has gone away (I can't seem to get it to run locally). thanks!! Andrew Add support for CPU isolation/monitoring of containers -- Key: YARN-147 URL: https://issues.apache.org/jira/browse/YARN-147 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Andrew Ferguson Fix For: 2.0.3-alpha Attachments: YARN-147-v1.patch, YARN-3.patch This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not show the SUBMIT PATCH button. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-147) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470908#comment-13470908 ] Hadoop QA commented on YARN-147: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12548069/YARN-3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/73//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/73//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/73//console This message is automatically generated. Add support for CPU isolation/monitoring of containers -- Key: YARN-147 URL: https://issues.apache.org/jira/browse/YARN-147 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Andrew Ferguson Fix For: 2.0.3-alpha Attachments: YARN-3.patch This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not show the SUBMIT PATCH button. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira