[jira] [Commented] (YARN-1004) yarn.scheduler.minimum|maximum|increment-allocation-mb should have scheduler
[ https://issues.apache.org/jira/browse/YARN-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13726122#comment-13726122 ] Sandy Ryza commented on YARN-1004: -- Uploaded a patch that adds "fair", "capacity", and "fifo" to the minimum and increment configs. The patch turned out to require quite a few changes. I wasn't sure how to deal with the slots-millis job counter - it seems like it doesn't make sense in the context of MR2 - so I removed it. If this would delay the release I'm not sure it's worth it. [~tucu], you worked on the per-scheduler separation of these configs. Any thoughts? > yarn.scheduler.minimum|maximum|increment-allocation-mb should have scheduler > > > Key: YARN-1004 > URL: https://issues.apache.org/jira/browse/YARN-1004 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza > Attachments: YARN-1004.patch > > > As yarn.scheduler.minimum-allocation-mb is now a scheduler-specific > configuration, and functions differently for the Fair and Capacity > schedulers, it would be less confusing for the config names to include the > scheduler names, i.e. yarn.scheduler.fair.minimum-allocation-mb, > yarn.scheduler.capacity.minimum-allocation-mb, and > yarn.scheduler.fifo.minimum-allocation-mb. > The same goes for yarn.scheduler.increment-allocation-mb, which only exists > for the Fair Scheduler, and yarn.scheduler.maximum-allocation-mb, for > consistency. > If we wish to preserve backwards compatibility, we can deprecate the old > configs to the new ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1004) yarn.scheduler.minimum|maximum|increment-allocation-mb should have scheduler
[ https://issues.apache.org/jira/browse/YARN-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13726124#comment-13726124 ] Sandy Ryza commented on YARN-1004: -- By which I mean [~tucu00] > yarn.scheduler.minimum|maximum|increment-allocation-mb should have scheduler > > > Key: YARN-1004 > URL: https://issues.apache.org/jira/browse/YARN-1004 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza > Attachments: YARN-1004.patch > > > As yarn.scheduler.minimum-allocation-mb is now a scheduler-specific > configuration, and functions differently for the Fair and Capacity > schedulers, it would be less confusing for the config names to include the > scheduler names, i.e. yarn.scheduler.fair.minimum-allocation-mb, > yarn.scheduler.capacity.minimum-allocation-mb, and > yarn.scheduler.fifo.minimum-allocation-mb. > The same goes for yarn.scheduler.increment-allocation-mb, which only exists > for the Fair Scheduler, and yarn.scheduler.maximum-allocation-mb, for > consistency. > If we wish to preserve backwards compatibility, we can deprecate the old > configs to the new ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (YARN-758) Augment MockNM to use multiple cores
[ https://issues.apache.org/jira/browse/YARN-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli reopened YARN-758: -- Our test-patch is not smart enough, there isn't any test in the patch but it didn't complain. I think we should have a candidate test that verifies that the code changes works (and more importantly useful). TestRMRestart used to be that, we could add a very simple test that hangs before the change and passes with. > Augment MockNM to use multiple cores > > > Key: YARN-758 > URL: https://issues.apache.org/jira/browse/YARN-758 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Karthik Kambatla > Fix For: 2.3.0 > > Attachments: yarn-758-1.patch, yarn-758-2.patch > > > YARN-757 got fixed by changing the scheduler from Fair to default (which is > capacity). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1003) Add a maxContainersPerNode config to the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13726118#comment-13726118 ] Vinod Kumar Vavilapalli commented on YARN-1003: --- bq. This makes it so that we don't need to use the resources we currently have as proxies for the ones we don't. If we add resources like disk and network I/O, this will become much less necessary, but I still think a high number of containers will put load on a system in ways that we won't account for. Open file descriptors for example. If my machine has 8 GB, I might want to allow a 512 MB container to fit in between a 4 GB one and a 3.5 GB one, but might not want to allow 16 512 MB containers. That doesn't sound right. While I agree that we aren't supporting all resources, this can be done by being conservative on the memory and/or cpu cores per node. Adding a new config won't make a difference, how much would you set it to be? > Add a maxContainersPerNode config to the Fair Scheduler > --- > > Key: YARN-1003 > URL: https://issues.apache.org/jira/browse/YARN-1003 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Assignee: Karthik Kambatla > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1004) yarn.scheduler.minimum|maximum|increment-allocation-mb should have scheduler
[ https://issues.apache.org/jira/browse/YARN-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1004: - Attachment: YARN-1004.patch > yarn.scheduler.minimum|maximum|increment-allocation-mb should have scheduler > > > Key: YARN-1004 > URL: https://issues.apache.org/jira/browse/YARN-1004 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza > Attachments: YARN-1004.patch > > > As yarn.scheduler.minimum-allocation-mb is now a scheduler-specific > configuration, and functions differently for the Fair and Capacity > schedulers, it would be less confusing for the config names to include the > scheduler names, i.e. yarn.scheduler.fair.minimum-allocation-mb, > yarn.scheduler.capacity.minimum-allocation-mb, and > yarn.scheduler.fifo.minimum-allocation-mb. > The same goes for yarn.scheduler.increment-allocation-mb, which only exists > for the Fair Scheduler, and yarn.scheduler.maximum-allocation-mb, for > consistency. > If we wish to preserve backwards compatibility, we can deprecate the old > configs to the new ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-758) Augment MockNM to use multiple cores
[ https://issues.apache.org/jira/browse/YARN-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13726114#comment-13726114 ] Hudson commented on YARN-758: - SUCCESS: Integrated in Hadoop-trunk-Commit #4197 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4197/]) YARN-758. Augment MockNM to use multiple cores (Karthik Kambatla via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1509086) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java > Augment MockNM to use multiple cores > > > Key: YARN-758 > URL: https://issues.apache.org/jira/browse/YARN-758 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Karthik Kambatla > Fix For: 2.3.0 > > Attachments: yarn-758-1.patch, yarn-758-2.patch > > > YARN-757 got fixed by changing the scheduler from Fair to default (which is > capacity). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1003) Add a maxContainersPerNode config to the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13726111#comment-13726111 ] Sandy Ryza commented on YARN-1003: -- My thinking was not per job. Agreed that the way to do that should be through the AM. > Add a maxContainersPerNode config to the Fair Scheduler > --- > > Key: YARN-1003 > URL: https://issues.apache.org/jira/browse/YARN-1003 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Assignee: Karthik Kambatla > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1003) Add a maxContainersPerNode config to the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13726101#comment-13726101 ] Vinod Kumar Vavilapalli commented on YARN-1003: --- I guess you are referring to restricting the number of tasks per node *per job*. If so, the implementation was always a hack in MR1. It was first wrongly put outside the scheduler, and then pushed into FairScheduler as devs behind FS wanted the feature atleast in FS as there was no correct way to do it in MR1/JT. The correct way to do this now with YARN is to implement it inside the AM. That's what users always wanted really, to restrict this job to run say only one task per node. > Add a maxContainersPerNode config to the Fair Scheduler > --- > > Key: YARN-1003 > URL: https://issues.apache.org/jira/browse/YARN-1003 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Assignee: Karthik Kambatla > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1003) Add a maxContainersPerNode config to the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13726100#comment-13726100 ] Sandy Ryza commented on YARN-1003: -- This makes it so that we don't need to use the resources we currently have as proxies for the ones we don't. If we add resources like disk and network I/O, this will become much less necessary, but I still think a high number of containers will put load on a system in ways that we won't account for. Open file descriptors for example. If my machine has 8 GB, I might want to allow a 512 MB container to fit in between a 4 GB one and a 3.5 GB one, but might not want to allow 16 512 MB containers. > Add a maxContainersPerNode config to the Fair Scheduler > --- > > Key: YARN-1003 > URL: https://issues.apache.org/jira/browse/YARN-1003 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Assignee: Karthik Kambatla > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1003) Add a maxContainersPerNode config to the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13726095#comment-13726095 ] Karthik Kambatla commented on YARN-1003: One of the most common questions we hear from people migrating to MR2 is how they can restrict the number of tasks per node. While they can adjust the task memory/cpu requirements for this, it is more involved compared to the MR1 model where one can set the max maps/reduces per node. > Add a maxContainersPerNode config to the Fair Scheduler > --- > > Key: YARN-1003 > URL: https://issues.apache.org/jira/browse/YARN-1003 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Assignee: Karthik Kambatla > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1003) Add a maxContainersPerNode config to the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13726089#comment-13726089 ] Vinod Kumar Vavilapalli commented on YARN-1003: --- Why is this needed? We already have upper limits on resources per node already. > Add a maxContainersPerNode config to the Fair Scheduler > --- > > Key: YARN-1003 > URL: https://issues.apache.org/jira/browse/YARN-1003 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Assignee: Karthik Kambatla > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-994) HeartBeat thread in AMRMClientAsync does not handle runtime exception correctly
[ https://issues.apache.org/jira/browse/YARN-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13726087#comment-13726087 ] Hadoop QA commented on YARN-994: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12595350/YARN-994.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1635//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1635//console This message is automatically generated. > HeartBeat thread in AMRMClientAsync does not handle runtime exception > correctly > --- > > Key: YARN-994 > URL: https://issues.apache.org/jira/browse/YARN-994 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-994.1.patch, YARN-994.2.patch > > > YARN-654 performs sanity checks for parameters of public methods in > AMRMClient. Those may create runtime exception. > Currently, heartBeat thread in AMRMClientAsync only captures IOException and > YarnException, and will not handle Runtime Exception properly. > Possible solution can be: heartbeat thread will catch throwable and notify > the callbackhandler thread via existing savedException -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-994) HeartBeat thread in AMRMClientAsync does not handle runtime exception correctly
[ https://issues.apache.org/jira/browse/YARN-994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-994: --- Attachment: YARN-994.2.patch Add a testcase to verify that the runtimeException will be captured and the callback handler is invoked correctly > HeartBeat thread in AMRMClientAsync does not handle runtime exception > correctly > --- > > Key: YARN-994 > URL: https://issues.apache.org/jira/browse/YARN-994 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-994.1.patch, YARN-994.2.patch > > > YARN-654 performs sanity checks for parameters of public methods in > AMRMClient. Those may create runtime exception. > Currently, heartBeat thread in AMRMClientAsync only captures IOException and > YarnException, and will not handle Runtime Exception properly. > Possible solution can be: heartbeat thread will catch throwable and notify > the callbackhandler thread via existing savedException -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-758) Augment MockNM to use multiple cores
[ https://issues.apache.org/jira/browse/YARN-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13726069#comment-13726069 ] Sandy Ryza commented on YARN-758: - +1 > Augment MockNM to use multiple cores > > > Key: YARN-758 > URL: https://issues.apache.org/jira/browse/YARN-758 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Karthik Kambatla > Attachments: yarn-758-1.patch, yarn-758-2.patch > > > YARN-757 got fixed by changing the scheduler from Fair to default (which is > capacity). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-107) ClientRMService.forceKillApplication() should handle the non-RUNNING applications properly
[ https://issues.apache.org/jira/browse/YARN-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13726063#comment-13726063 ] Hadoop QA commented on YARN-107: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12595345/YARN-107.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1634//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1634//console This message is automatically generated. > ClientRMService.forceKillApplication() should handle the non-RUNNING > applications properly > -- > > Key: YARN-107 > URL: https://issues.apache.org/jira/browse/YARN-107 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.0-alpha >Reporter: Devaraj K >Assignee: Xuan Gong > Attachments: YARN-107.1.patch, YARN-107.2.patch, YARN-107.3.patch, > YARN-107.4.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-107) ClientRMService.forceKillApplication() should handle the non-RUNNING applications properly
[ https://issues.apache.org/jira/browse/YARN-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-107: --- Attachment: YARN-107.4.patch Check state in CLI before we send the kill command, and will print out different message if the application is already in terminated state. Also keep forceKillApplication return quietly if it tries to kill a non-running application > ClientRMService.forceKillApplication() should handle the non-RUNNING > applications properly > -- > > Key: YARN-107 > URL: https://issues.apache.org/jira/browse/YARN-107 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.0-alpha >Reporter: Devaraj K >Assignee: Xuan Gong > Attachments: YARN-107.1.patch, YARN-107.2.patch, YARN-107.3.patch, > YARN-107.4.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-808) ApplicationReport does not clearly tell that the attempt is running or not
[ https://issues.apache.org/jira/browse/YARN-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13726042#comment-13726042 ] Zhijie Shen commented on YARN-808: -- +1 for embedding ApplicationAttemptReport in ApplicationReport. Think it out loudly. I've a concern that with more info to be fetched, getApplicationReport is likely to be slower, and the response message is likely to be bigger. However, users may not always want to know all the info of an application, such as the embedded ApplicationAttemptReport, right? Sometimes users just want to fetch partial information of an application to speed up the response. Thoughts? > ApplicationReport does not clearly tell that the attempt is running or not > -- > > Key: YARN-808 > URL: https://issues.apache.org/jira/browse/YARN-808 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Xuan Gong > Attachments: YARN-808.1.patch > > > When an app attempt fails and is being retried, ApplicationReport immediately > gives the new attemptId and non-null values of host etc. There is no way for > clients to know that the attempt is running other than connecting to it and > timing out on invalid host. Solution would be to expose the attempt state or > return a null value for host instead of "N/A" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-994) HeartBeat thread in AMRMClientAsync does not handle runtime exception correctly
[ https://issues.apache.org/jira/browse/YARN-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13726039#comment-13726039 ] Xuan Gong commented on YARN-994: Yes, i will do that > HeartBeat thread in AMRMClientAsync does not handle runtime exception > correctly > --- > > Key: YARN-994 > URL: https://issues.apache.org/jira/browse/YARN-994 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-994.1.patch > > > YARN-654 performs sanity checks for parameters of public methods in > AMRMClient. Those may create runtime exception. > Currently, heartBeat thread in AMRMClientAsync only captures IOException and > YarnException, and will not handle Runtime Exception properly. > Possible solution can be: heartbeat thread will catch throwable and notify > the callbackhandler thread via existing savedException -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-855) YarnClient.init should ensure that yarn parameters are present
[ https://issues.apache.org/jira/browse/YARN-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725969#comment-13725969 ] Siddharth Seth commented on YARN-855: - The simplest would be to check the configuration type - which keeps the API stable. The reason I mentioned parameters is that apps that use YarnClient may have their own configuration type - e.g. JobConf or a HiveConf. Type information ends up getting lost even if these apps have created their configurations based on a YarnConfiguration. > YarnClient.init should ensure that yarn parameters are present > -- > > Key: YARN-855 > URL: https://issues.apache.org/jira/browse/YARN-855 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.0.5-alpha >Reporter: Siddharth Seth >Assignee: Abhishek Kapoor > > It currently accepts a Configuration object in init and doesn't check whether > it contains yarn parameters or is a YarnConfiguration. Should either accept > YarnConfiguration, check existence of parameters or create a > YarnConfiguration based on the configuration passed to it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-758) Augment MockNM to use multiple cores
[ https://issues.apache.org/jira/browse/YARN-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725950#comment-13725950 ] Hadoop QA commented on YARN-758: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12595317/yarn-758-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1633//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1633//console This message is automatically generated. > Augment MockNM to use multiple cores > > > Key: YARN-758 > URL: https://issues.apache.org/jira/browse/YARN-758 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Karthik Kambatla > Attachments: yarn-758-1.patch, yarn-758-2.patch > > > YARN-757 got fixed by changing the scheduler from Fair to default (which is > capacity). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-770) NPE NodeStatusUpdaterImpl
[ https://issues.apache.org/jira/browse/YARN-770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-770. -- Resolution: Invalid [~ste...@apache.org], I am closing this as invalid for now. The code has changed a lot and it isn't apparent what was causing it. Please feel free to reopen it when you run into it again. Thanks! > NPE NodeStatusUpdaterImpl > - > > Key: YARN-770 > URL: https://issues.apache.org/jira/browse/YARN-770 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Priority: Minor > > A mini yarn cluster based test just failed -NPE in the logs in > {{NodeStatusUpdaterImpl}}, which is probably a symptom of the problem, not > the cause -network trouble more likely there- but it shows there's some extra > checking for null responses. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-758) Augment MockNM to use multiple cores
[ https://issues.apache.org/jira/browse/YARN-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-758: -- Summary: Augment MockNM to use multiple cores (was: TestRMRestart should use MockNMs with multiple cores) > Augment MockNM to use multiple cores > > > Key: YARN-758 > URL: https://issues.apache.org/jira/browse/YARN-758 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Karthik Kambatla > Attachments: yarn-758-1.patch, yarn-758-2.patch > > > YARN-757 got fixed by changing the scheduler from Fair to default (which is > capacity). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-758) TestRMRestart should use MockNMs with multiple cores
[ https://issues.apache.org/jira/browse/YARN-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-758: -- Attachment: yarn-758-2.patch Updated patch to address Sandy's comment. With this fix, there was no need to touch TestRMRestart, verified it passes. > TestRMRestart should use MockNMs with multiple cores > > > Key: YARN-758 > URL: https://issues.apache.org/jira/browse/YARN-758 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Karthik Kambatla > Attachments: yarn-758-1.patch, yarn-758-2.patch > > > YARN-757 got fixed by changing the scheduler from Fair to default (which is > capacity). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-903) DistributedShell throwing Errors in logs after successfull completion
[ https://issues.apache.org/jira/browse/YARN-903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725936#comment-13725936 ] Hadoop QA commented on YARN-903: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12595315/YARN-903-20130731.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1632//console This message is automatically generated. > DistributedShell throwing Errors in logs after successfull completion > - > > Key: YARN-903 > URL: https://issues.apache.org/jira/browse/YARN-903 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell >Affects Versions: 2.0.4-alpha > Environment: Ununtu 11.10 >Reporter: Abhishek Kapoor >Assignee: Omkar Vinit Joshi > Attachments: AppMaster.stderr, YARN-903-20130717.1.patch, > YARN-903-20130718.1.patch, YARN-903-20130723.patch, > YARN-903-20130729.1.patch, YARN-903-20130730.1.patch, > YARN-903-20130731.1.patch, YARN-903-20130731.2.patch, > yarn-sunny-nodemanager-sunny-Inspiron.log > > > I have tried running DistributedShell and also used ApplicationMaster of the > same for my test. > The application is successfully running through logging some errors which > would be useful to fix. > Below are the logs from NodeManager and ApplicationMasterode > Log Snippet for NodeManager > = > 2013-07-07 13:39:18,787 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Connecting > to ResourceManager at localhost/127.0.0.1:9990. current no. of attempts is 1 > 2013-07-07 13:39:19,050 INFO > org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: > Rolling master-key for container-tokens, got key with id -325382586 > 2013-07-07 13:39:19,052 INFO > org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: > Rolling master-key for nm-tokens, got key with id :1005046570 > 2013-07-07 13:39:19,053 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered > with ResourceManager as sunny-Inspiron:9993 with total resource of > > 2013-07-07 13:39:19,053 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Notifying > ContainerManager to unblock new container-requests > 2013-07-07 13:39:35,256 INFO SecurityLogger.org.apache.hadoop.ipc.Server: > Auth successful for appattempt_1373184544832_0001_01 (auth:SIMPLE) > 2013-07-07 13:39:35,492 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Start request for container_1373184544832_0001_01_01 by user sunny > 2013-07-07 13:39:35,507 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Creating a new application reference for app application_1373184544832_0001 > 2013-07-07 13:39:35,511 INFO > org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=sunny > IP=127.0.0.1OPERATION=Start Container Request > TARGET=ContainerManageImpl RESULT=SUCCESS > APPID=application_1373184544832_0001 > CONTAINERID=container_1373184544832_0001_01_01 > 2013-07-07 13:39:35,511 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Application application_1373184544832_0001 transitioned from NEW to INITING > 2013-07-07 13:39:35,512 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Adding container_1373184544832_0001_01_01 to application > application_1373184544832_0001 > 2013-07-07 13:39:35,518 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Application application_1373184544832_0001 transitioned from INITING to > RUNNING > 2013-07-07 13:39:35,528 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: > Container container_1373184544832_0001_01_01 transitioned from NEW to > LOCALIZING > 2013-07-07 13:39:35,540 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: > Resource hdfs://localhost:9000/application/test.jar transitioned from INIT > to DOWNLOADING > 2013-07-07 13:39:35,540 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Created localizer for container_1373184544832_0001_01_01 > 2013-07-07 13:39:35,675 INFO
[jira] [Commented] (YARN-573) Shared data structures in Public Localizer and Private Localizer are not Thread safe.
[ https://issues.apache.org/jira/browse/YARN-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725932#comment-13725932 ] Hadoop QA commented on YARN-573: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12595310/YARN-573-20130731.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1631//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1631//console This message is automatically generated. > Shared data structures in Public Localizer and Private Localizer are not > Thread safe. > - > > Key: YARN-573 > URL: https://issues.apache.org/jira/browse/YARN-573 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi >Priority: Critical > Attachments: YARN-573-20130730.1.patch, YARN-573-20130731.1.patch > > > PublicLocalizer > 1) pending accessed by addResource (part of event handling) and run method > (as a part of PublicLocalizer.run() ). > PrivateLocalizer > 1) pending accessed by addResource (part of event handling) and > findNextResource (i.remove()). Also update method should be fixed. It too is > sharing pending list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event with FairScheduler
[ https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725929#comment-13725929 ] Hudson commented on YARN-502: - SUCCESS: Integrated in Hadoop-trunk-Commit #4193 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4193/]) YARN-502. Fixed a state machine issue with RMNode inside ResourceManager which was crashing scheduler. Contributed by Mayank Bansal. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1509060) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java > RM crash with NPE on NODE_REMOVED event with FairScheduler > -- > > Key: YARN-502 > URL: https://issues.apache.org/jira/browse/YARN-502 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.0.3-alpha >Reporter: Lohit Vijayarenu >Assignee: Mayank Bansal > Fix For: 2.1.1-beta > > Attachments: YARN-502-trunk-1.patch, YARN-502-trunk-2.patch, > YARN-502-trunk-3.patch > > > While running some test and adding/removing nodes, we see RM crashed with the > below exception. We are testing with fair scheduler and running > hadoop-2.0.3-alpha > {noformat} > 2013-03-22 18:54:27,015 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating > Node :55680 as it is now LOST > 2013-03-22 18:54:27,015 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 > Node Transitioned from UNHEALTHY to LOST > 2013-03-22 18:54:27,015 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type NODE_REMOVED to the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375) > at java.lang.Thread.run(Thread.java:662) > 2013-03-22 18:54:27,016 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. > 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped > SelectChannelConnector@:50030 > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-854) App submission fails on secure deploy
[ https://issues.apache.org/jira/browse/YARN-854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725923#comment-13725923 ] Konstantin Boudnik commented on YARN-854: - I have started the release process... > App submission fails on secure deploy > - > > Key: YARN-854 > URL: https://issues.apache.org/jira/browse/YARN-854 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Ramya Sunil >Assignee: Omkar Vinit Joshi >Priority: Blocker > Fix For: 2.1.0-beta > > Attachments: YARN-854.20130619.1.patch, YARN-854.20130619.2.patch, > YARN-854.20130619.patch, YARN-854-branch-2.0.6.patch > > > App submission on secure cluster fails with the following exception: > {noformat} > INFO mapreduce.Job: Job jobID failed with state FAILED due to: Application > applicationID failed 2 times due to AM Container for appattemptID exited with > exitCode: -1000 due to: App initialization failed (255) with output: main : > command provided 0 > main : user is qa_user > javax.security.sasl.SaslException: DIGEST-MD5: digest response format > violation. Mismatched response. [Caused by > org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): > DIGEST-MD5: digest response format violation. Mismatched response.] > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > at > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104) > at > org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:65) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:235) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:348) > Caused by: > org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): > DIGEST-MD5: digest response format violation. Mismatched response. > at org.apache.hadoop.ipc.Client.call(Client.java:1298) > at org.apache.hadoop.ipc.Client.call(Client.java:1250) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:204) > at $Proxy7.heartbeat(Unknown Source) > at > org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62) > ... 3 more > .Failing this attempt.. Failing the application. > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-903) DistributedShell throwing Errors in logs after successfull completion
[ https://issues.apache.org/jira/browse/YARN-903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-903: --- Attachment: YARN-903-20130731.2.patch > DistributedShell throwing Errors in logs after successfull completion > - > > Key: YARN-903 > URL: https://issues.apache.org/jira/browse/YARN-903 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell >Affects Versions: 2.0.4-alpha > Environment: Ununtu 11.10 >Reporter: Abhishek Kapoor >Assignee: Omkar Vinit Joshi > Attachments: AppMaster.stderr, YARN-903-20130717.1.patch, > YARN-903-20130718.1.patch, YARN-903-20130723.patch, > YARN-903-20130729.1.patch, YARN-903-20130730.1.patch, > YARN-903-20130731.1.patch, YARN-903-20130731.2.patch, > yarn-sunny-nodemanager-sunny-Inspiron.log > > > I have tried running DistributedShell and also used ApplicationMaster of the > same for my test. > The application is successfully running through logging some errors which > would be useful to fix. > Below are the logs from NodeManager and ApplicationMasterode > Log Snippet for NodeManager > = > 2013-07-07 13:39:18,787 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Connecting > to ResourceManager at localhost/127.0.0.1:9990. current no. of attempts is 1 > 2013-07-07 13:39:19,050 INFO > org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: > Rolling master-key for container-tokens, got key with id -325382586 > 2013-07-07 13:39:19,052 INFO > org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: > Rolling master-key for nm-tokens, got key with id :1005046570 > 2013-07-07 13:39:19,053 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered > with ResourceManager as sunny-Inspiron:9993 with total resource of > > 2013-07-07 13:39:19,053 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Notifying > ContainerManager to unblock new container-requests > 2013-07-07 13:39:35,256 INFO SecurityLogger.org.apache.hadoop.ipc.Server: > Auth successful for appattempt_1373184544832_0001_01 (auth:SIMPLE) > 2013-07-07 13:39:35,492 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Start request for container_1373184544832_0001_01_01 by user sunny > 2013-07-07 13:39:35,507 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Creating a new application reference for app application_1373184544832_0001 > 2013-07-07 13:39:35,511 INFO > org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=sunny > IP=127.0.0.1OPERATION=Start Container Request > TARGET=ContainerManageImpl RESULT=SUCCESS > APPID=application_1373184544832_0001 > CONTAINERID=container_1373184544832_0001_01_01 > 2013-07-07 13:39:35,511 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Application application_1373184544832_0001 transitioned from NEW to INITING > 2013-07-07 13:39:35,512 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Adding container_1373184544832_0001_01_01 to application > application_1373184544832_0001 > 2013-07-07 13:39:35,518 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Application application_1373184544832_0001 transitioned from INITING to > RUNNING > 2013-07-07 13:39:35,528 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: > Container container_1373184544832_0001_01_01 transitioned from NEW to > LOCALIZING > 2013-07-07 13:39:35,540 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: > Resource hdfs://localhost:9000/application/test.jar transitioned from INIT > to DOWNLOADING > 2013-07-07 13:39:35,540 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Created localizer for container_1373184544832_0001_01_01 > 2013-07-07 13:39:35,675 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Writing credentials to the nmPrivate file > /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/nmPrivate/container_1373184544832_0001_01_01.tokens. > Credentials list: > 2013-07-07 13:39:35,694 INFO > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: > Initializing user sunny > 2013-07-07 13:39:35,803 INFO > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying > from > /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/nmPrivate/container_1373184544832_0001
[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event with FairScheduler
[ https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725913#comment-13725913 ] Vinod Kumar Vavilapalli commented on YARN-502: -- Though the explicit state check is a little unmaintainable if we have new states in future, the current change is less intrusive. The better way could have been creating new transition class, but I'm okay. +1, checking this in. > RM crash with NPE on NODE_REMOVED event with FairScheduler > -- > > Key: YARN-502 > URL: https://issues.apache.org/jira/browse/YARN-502 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.0.3-alpha >Reporter: Lohit Vijayarenu >Assignee: Mayank Bansal > Attachments: YARN-502-trunk-1.patch, YARN-502-trunk-2.patch, > YARN-502-trunk-3.patch > > > While running some test and adding/removing nodes, we see RM crashed with the > below exception. We are testing with fair scheduler and running > hadoop-2.0.3-alpha > {noformat} > 2013-03-22 18:54:27,015 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating > Node :55680 as it is now LOST > 2013-03-22 18:54:27,015 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 > Node Transitioned from UNHEALTHY to LOST > 2013-03-22 18:54:27,015 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type NODE_REMOVED to the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375) > at java.lang.Thread.run(Thread.java:662) > 2013-03-22 18:54:27,016 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. > 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped > SelectChannelConnector@:50030 > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-573) Shared data structures in Public Localizer and Private Localizer are not Thread safe.
[ https://issues.apache.org/jira/browse/YARN-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-573: --- Attachment: YARN-573-20130731.1.patch > Shared data structures in Public Localizer and Private Localizer are not > Thread safe. > - > > Key: YARN-573 > URL: https://issues.apache.org/jira/browse/YARN-573 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi >Priority: Critical > Attachments: YARN-573-20130730.1.patch, YARN-573-20130731.1.patch > > > PublicLocalizer > 1) pending accessed by addResource (part of event handling) and run method > (as a part of PublicLocalizer.run() ). > PrivateLocalizer > 1) pending accessed by addResource (part of event handling) and > findNextResource (i.remove()). Also update method should be fixed. It too is > sharing pending list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-573) Shared data structures in Public Localizer and Private Localizer are not Thread safe.
[ https://issues.apache.org/jira/browse/YARN-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725909#comment-13725909 ] Omkar Vinit Joshi commented on YARN-573: Thanks [~jlowe] and [~sjlee0] for reviewing.. Fixed the comments.. [~sjlee0] yes ConcurrentLinkedQueue will solve this synchronization issue altogether. I am planning to restructure it a lot when we end up fixing YARN-574. Today update method is making 2 calls to findNextResources which ideally should be one. After that the whole code itself will get simplified a lot ..Also inside findNextResources we are repeatedly checking for the same set of resources (list) again and again until the resource gets downloaded.. which ideally should only be done onceyes but this is out of the scope for this jira...will definitely address it on another jira. (YARN-574) > Shared data structures in Public Localizer and Private Localizer are not > Thread safe. > - > > Key: YARN-573 > URL: https://issues.apache.org/jira/browse/YARN-573 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi >Priority: Critical > Attachments: YARN-573-20130730.1.patch, YARN-573-20130731.1.patch > > > PublicLocalizer > 1) pending accessed by addResource (part of event handling) and run method > (as a part of PublicLocalizer.run() ). > PrivateLocalizer > 1) pending accessed by addResource (part of event handling) and > findNextResource (i.remove()). Also update method should be fixed. It too is > sharing pending list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-975) Adding HDFS implementation for grouped reading and writing interfaces of history storage
[ https://issues.apache.org/jira/browse/YARN-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-975: - Attachment: YARN-975.2.patch Updated the patch: 1. Fix some logic bugs in the previous patch. 2. Added the test case. 3. Fix the javadoc warnings of ApplicationHistoryReader 4. Change reader's and writer's methods to throw IOException. > Adding HDFS implementation for grouped reading and writing interfaces of > history storage > > > Key: YARN-975 > URL: https://issues.apache.org/jira/browse/YARN-975 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-975.1.patch, YARN-975.2.patch > > > HDFS implementation should be a standard persistence strategy of history > storage -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-972) Allow requests and scheduling for fractional virtual cores
[ https://issues.apache.org/jira/browse/YARN-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725898#comment-13725898 ] Allen Wittenauer commented on YARN-972: --- Thought running through my head: "I hope I have a way to turn this off because it does more harm than good. I guess the alternative is just rip it out of the code base. Thank goodness I build my own releases and I'm not reliant on a vendor." > Allow requests and scheduling for fractional virtual cores > -- > > Key: YARN-972 > URL: https://issues.apache.org/jira/browse/YARN-972 > Project: Hadoop YARN > Issue Type: Improvement > Components: api, scheduler >Affects Versions: 2.0.5-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > > As this idea sparked a fair amount of discussion on YARN-2, I'd like to go > deeper into the reasoning. > Currently the virtual core abstraction hides two orthogonal goals. The first > is that a cluster might have heterogeneous hardware and that the processing > power of different makes of cores can vary wildly. The second is that a > different (combinations of) workloads can require different levels of > granularity. E.g. one admin might want every task on their cluster to use at > least a core, while another might want applications to be able to request > quarters of cores. The former would configure a single vcore per core. The > latter would configure four vcores per core. > I don't think that the abstraction is a good way of handling the second goal. > Having a virtual cores refer to different magnitudes of processing power on > different clusters will make the difficult problem of deciding how many cores > to request for a job even more confusing. > Can we not handle this with dynamic oversubscription? > Dynamic oversubscription, i.e. adjusting the number of cores offered by a > machine based on measured CPU-consumption, should work as a complement to > fine-granularity scheduling. Dynamic oversubscription is never going to be > perfect, as the amount of CPU a process consumes can vary widely over its > lifetime. A task that first loads a bunch of data over the network and then > performs complex computations on it will suffer if additional CPU-heavy tasks > are scheduled on the same node because its initial CPU-utilization was low. > To guard against this, we will need to be conservative with how we > dynamically oversubscribe. If a user wants to explicitly hint to the > scheduler that their task will not use much CPU, the scheduler should be able > to take this into account. > On YARN-2, there are concerns that including floating point arithmetic in the > scheduler will slow it down. I question this assumption, and it is perhaps > worth debating, but I think we can sidestep the issue by multiplying > CPU-quantities inside the scheduler by a decently sized number like 1000 and > keep doing the computations on integers. > The relevant APIs are marked as evolving, so there's no need for the change > to delay 2.1.0-beta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-758) TestRMRestart should use MockNMs with multiple cores
[ https://issues.apache.org/jira/browse/YARN-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725842#comment-13725842 ] Sandy Ryza commented on YARN-758: - There are likely a number of other tests that have similar problems. Would you be opposed to having the existing constructor scale the number of vcores by the amount of memory so that we can avoid fixing all of them individually? > TestRMRestart should use MockNMs with multiple cores > > > Key: YARN-758 > URL: https://issues.apache.org/jira/browse/YARN-758 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Karthik Kambatla > Attachments: yarn-758-1.patch > > > YARN-757 got fixed by changing the scheduler from Fair to default (which is > capacity). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-972) Allow requests and scheduling for fractional virtual cores
[ https://issues.apache.org/jira/browse/YARN-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725837#comment-13725837 ] Sandy Ryza commented on YARN-972: - I should have probably tried to be more clear about what I think the goals of virtual cores are from a more zoomed-out perspective before arguing about their specifics. The problems we have been considering solving with virtual cores are: 1. "Many of the jobs on my cluster are computational simulations that use many threads per task. Many of the other jobs on my cluster are distcp's that are primarily I/O bound. Many of the jobs on my cluster are MapReduce that do something like apply a transformation to text, which are single-threaded, but can saturate a core. How can we schedule these to maximize utilization and minimize harmful interference?" 2. "I recently added machines with more or beefier CPUs to my cluster. I would like to run more concurrent tasks on these machines than on other machines." 3. "I recently added machines with more or beefier CPUs to my cluster. I would like my jobs to run at predictable speeds." 4. "CPUs vary widely in the world, but I would like to be able to take my job to another cluster and have it run at a similar speed." I think (1) is the main problem we should be trying to solve. (2) is also important, and much easier to think about when the new machines have a higher number of cores, but not substantially more powerful cores. Luckily, the trend is towards more cores per machine, not more powerful cores. I think we should not be trying to solve (3) and (4). There are too many variables, the real-world utility is too small, and the goals are unrealistic. The features proposed in YARN-796 are better approaches to handling this. To these ends, here is how think resource configurations should be used: A task should request virtual cores equal to the number of cores it thinks it can saturate. A task that runs in a single thread, no matter how CPU-intensive it is, should request a single virtual core. A task that is inherently I/O-bound, like a distcp or simple grep, should request less than a single virtual core. A task that can take advantage of multiple threads should request a number of cores equal to the number of threads it intends to take advantage of. NodeManagers should be configured with virtual cores equal to the number of physical cores on the node. If the speed of a aingle core varies widely within a cluster (maybe by a factor of two or more), an administrator can consider configuing more virtual cores than physical cores on the faster nodes, with the acknowledgement that task performance will still not be predictable. Virtual cores should not be used as a proxy for other resources, such as disk I/O or network I/O. We should ultimately add in disk I/O and possibly network I/O as another first-class resource, but in the mean time a config to limit the number of containers per node seems doesn't seem unreasonable. As Arun points out, we can realize this vision equivalently by saying that one physical core is always equal to 1000 virtual cores. However, to me this seems like an unnecessary layer of indirection for the user, and obscures the fact that virtual cores are meant to model parallelism before processing power. If our only reason for considering this is perfomance, we should and can handle this internally. I am not obstinately opposed to going this route, but if we do I think a name like "core thousandths" would be more clear. Thoughts? > Allow requests and scheduling for fractional virtual cores > -- > > Key: YARN-972 > URL: https://issues.apache.org/jira/browse/YARN-972 > Project: Hadoop YARN > Issue Type: Improvement > Components: api, scheduler >Affects Versions: 2.0.5-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > > As this idea sparked a fair amount of discussion on YARN-2, I'd like to go > deeper into the reasoning. > Currently the virtual core abstraction hides two orthogonal goals. The first > is that a cluster might have heterogeneous hardware and that the processing > power of different makes of cores can vary wildly. The second is that a > different (combinations of) workloads can require different levels of > granularity. E.g. one admin might want every task on their cluster to use at > least a core, while another might want applications to be able to request > quarters of cores. The former would configure a single vcore per core. The > latter would configure four vcores per core. > I don't think that the abstraction is a good way of handling the second goal. > Having a virtual cores refer to different magnitudes of processing power on > different clusters will make the
[jira] [Commented] (YARN-956) [YARN-321] Add History Store interface and testable in-memory HistoryStorage
[ https://issues.apache.org/jira/browse/YARN-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725818#comment-13725818 ] Zhijie Shen commented on YARN-956: -- bq. Can you also write a test for InMemory storage. This will be a good starting point to begin writing tests for AHS. Tx, [~vinodkv], I recall one problem. WRT the test, if it has reference to RM, I think it's good to be placed either in resourcemanager or server-test sub-project, instead of applicationhistoryservice sub-project. This is because resourcemanager sub-project will eventually refer applicationhistoryservice. Then, if applicationhistoryservice already has the dependency on resourcemanager, there will be cyclic dependency. Recall the issue in YARN-641. > [YARN-321] Add History Store interface and testable in-memory HistoryStorage > > > Key: YARN-956 > URL: https://issues.apache.org/jira/browse/YARN-956 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-956-1.patch, YARN-956-2.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-808) ApplicationReport does not clearly tell that the attempt is running or not
[ https://issues.apache.org/jira/browse/YARN-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725809#comment-13725809 ] Hadoop QA commented on YARN-808: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12595279/YARN-808.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1629//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1629//console This message is automatically generated. > ApplicationReport does not clearly tell that the attempt is running or not > -- > > Key: YARN-808 > URL: https://issues.apache.org/jira/browse/YARN-808 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Xuan Gong > Attachments: YARN-808.1.patch > > > When an app attempt fails and is being retried, ApplicationReport immediately > gives the new attemptId and non-null values of host etc. There is no way for > clients to know that the attempt is running other than connecting to it and > timing out on invalid host. Solution would be to expose the attempt state or > return a null value for host instead of "N/A" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-903) DistributedShell throwing Errors in logs after successfull completion
[ https://issues.apache.org/jira/browse/YARN-903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725806#comment-13725806 ] Hadoop QA commented on YARN-903: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12595291/YARN-903-20130731.1.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1630//console This message is automatically generated. > DistributedShell throwing Errors in logs after successfull completion > - > > Key: YARN-903 > URL: https://issues.apache.org/jira/browse/YARN-903 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell >Affects Versions: 2.0.4-alpha > Environment: Ununtu 11.10 >Reporter: Abhishek Kapoor >Assignee: Omkar Vinit Joshi > Attachments: AppMaster.stderr, YARN-903-20130717.1.patch, > YARN-903-20130718.1.patch, YARN-903-20130723.patch, > YARN-903-20130729.1.patch, YARN-903-20130730.1.patch, > YARN-903-20130731.1.patch, yarn-sunny-nodemanager-sunny-Inspiron.log > > > I have tried running DistributedShell and also used ApplicationMaster of the > same for my test. > The application is successfully running through logging some errors which > would be useful to fix. > Below are the logs from NodeManager and ApplicationMasterode > Log Snippet for NodeManager > = > 2013-07-07 13:39:18,787 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Connecting > to ResourceManager at localhost/127.0.0.1:9990. current no. of attempts is 1 > 2013-07-07 13:39:19,050 INFO > org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: > Rolling master-key for container-tokens, got key with id -325382586 > 2013-07-07 13:39:19,052 INFO > org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: > Rolling master-key for nm-tokens, got key with id :1005046570 > 2013-07-07 13:39:19,053 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered > with ResourceManager as sunny-Inspiron:9993 with total resource of > > 2013-07-07 13:39:19,053 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Notifying > ContainerManager to unblock new container-requests > 2013-07-07 13:39:35,256 INFO SecurityLogger.org.apache.hadoop.ipc.Server: > Auth successful for appattempt_1373184544832_0001_01 (auth:SIMPLE) > 2013-07-07 13:39:35,492 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Start request for container_1373184544832_0001_01_01 by user sunny > 2013-07-07 13:39:35,507 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Creating a new application reference for app application_1373184544832_0001 > 2013-07-07 13:39:35,511 INFO > org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=sunny > IP=127.0.0.1OPERATION=Start Container Request > TARGET=ContainerManageImpl RESULT=SUCCESS > APPID=application_1373184544832_0001 > CONTAINERID=container_1373184544832_0001_01_01 > 2013-07-07 13:39:35,511 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Application application_1373184544832_0001 transitioned from NEW to INITING > 2013-07-07 13:39:35,512 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Adding container_1373184544832_0001_01_01 to application > application_1373184544832_0001 > 2013-07-07 13:39:35,518 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Application application_1373184544832_0001 transitioned from INITING to > RUNNING > 2013-07-07 13:39:35,528 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: > Container container_1373184544832_0001_01_01 transitioned from NEW to > LOCALIZING > 2013-07-07 13:39:35,540 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: > Resource hdfs://localhost:9000/application/test.jar transitioned from INIT > to DOWNLOADING > 2013-07-07 13:39:35,540 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Created localizer for container_1373184544832_0001_01_01 > 2013-07-07 13:39:35,675 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Writing credentials to the nmPrivate file > /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/nmPrivate/container_1373184544
[jira] [Commented] (YARN-107) ClientRMService.forceKillApplication() should handle the non-RUNNING applications properly
[ https://issues.apache.org/jira/browse/YARN-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725802#comment-13725802 ] Vinod Kumar Vavilapalli commented on YARN-107: -- Clearly, we have two options, throw an exception if an app is already finished or return quietly. It is indeed useful to say that an application finished for CLI, but the client itself can do that. ApplicationCLI can check the state and print a different message. If it is via API, anybody can do a state check if need be. In that light, I am not strongly opinionated eitherways, but we can keep forceKillApplication just one of a forced operations which usually return quietly. (Like rm -f). > ClientRMService.forceKillApplication() should handle the non-RUNNING > applications properly > -- > > Key: YARN-107 > URL: https://issues.apache.org/jira/browse/YARN-107 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.0-alpha >Reporter: Devaraj K >Assignee: Xuan Gong > Attachments: YARN-107.1.patch, YARN-107.2.patch, YARN-107.3.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-903) DistributedShell throwing Errors in logs after successfull completion
[ https://issues.apache.org/jira/browse/YARN-903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-903: --- Attachment: YARN-903-20130731.1.patch > DistributedShell throwing Errors in logs after successfull completion > - > > Key: YARN-903 > URL: https://issues.apache.org/jira/browse/YARN-903 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell >Affects Versions: 2.0.4-alpha > Environment: Ununtu 11.10 >Reporter: Abhishek Kapoor >Assignee: Omkar Vinit Joshi > Attachments: AppMaster.stderr, YARN-903-20130717.1.patch, > YARN-903-20130718.1.patch, YARN-903-20130723.patch, > YARN-903-20130729.1.patch, YARN-903-20130730.1.patch, > YARN-903-20130731.1.patch, yarn-sunny-nodemanager-sunny-Inspiron.log > > > I have tried running DistributedShell and also used ApplicationMaster of the > same for my test. > The application is successfully running through logging some errors which > would be useful to fix. > Below are the logs from NodeManager and ApplicationMasterode > Log Snippet for NodeManager > = > 2013-07-07 13:39:18,787 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Connecting > to ResourceManager at localhost/127.0.0.1:9990. current no. of attempts is 1 > 2013-07-07 13:39:19,050 INFO > org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: > Rolling master-key for container-tokens, got key with id -325382586 > 2013-07-07 13:39:19,052 INFO > org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: > Rolling master-key for nm-tokens, got key with id :1005046570 > 2013-07-07 13:39:19,053 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered > with ResourceManager as sunny-Inspiron:9993 with total resource of > > 2013-07-07 13:39:19,053 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Notifying > ContainerManager to unblock new container-requests > 2013-07-07 13:39:35,256 INFO SecurityLogger.org.apache.hadoop.ipc.Server: > Auth successful for appattempt_1373184544832_0001_01 (auth:SIMPLE) > 2013-07-07 13:39:35,492 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Start request for container_1373184544832_0001_01_01 by user sunny > 2013-07-07 13:39:35,507 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Creating a new application reference for app application_1373184544832_0001 > 2013-07-07 13:39:35,511 INFO > org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=sunny > IP=127.0.0.1OPERATION=Start Container Request > TARGET=ContainerManageImpl RESULT=SUCCESS > APPID=application_1373184544832_0001 > CONTAINERID=container_1373184544832_0001_01_01 > 2013-07-07 13:39:35,511 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Application application_1373184544832_0001 transitioned from NEW to INITING > 2013-07-07 13:39:35,512 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Adding container_1373184544832_0001_01_01 to application > application_1373184544832_0001 > 2013-07-07 13:39:35,518 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Application application_1373184544832_0001 transitioned from INITING to > RUNNING > 2013-07-07 13:39:35,528 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: > Container container_1373184544832_0001_01_01 transitioned from NEW to > LOCALIZING > 2013-07-07 13:39:35,540 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: > Resource hdfs://localhost:9000/application/test.jar transitioned from INIT > to DOWNLOADING > 2013-07-07 13:39:35,540 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Created localizer for container_1373184544832_0001_01_01 > 2013-07-07 13:39:35,675 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Writing credentials to the nmPrivate file > /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/nmPrivate/container_1373184544832_0001_01_01.tokens. > Credentials list: > 2013-07-07 13:39:35,694 INFO > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: > Initializing user sunny > 2013-07-07 13:39:35,803 INFO > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying > from > /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/nmPrivate/container_1373184544832_0001_01_01.tokens > to > /ho
[jira] [Assigned] (YARN-953) [YARN-321] Change ResourceManager to use HistoryStorage to log history data
[ https://issues.apache.org/jira/browse/YARN-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli reassigned YARN-953: Assignee: Zhijie Shen (was: Vinod Kumar Vavilapalli) > [YARN-321] Change ResourceManager to use HistoryStorage to log history data > --- > > Key: YARN-953 > URL: https://issues.apache.org/jira/browse/YARN-953 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vinod Kumar Vavilapalli >Assignee: Zhijie Shen > Attachments: YARN-953.1.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-953) [YARN-321] Change ResourceManager to use HistoryStorage to log history data
[ https://issues.apache.org/jira/browse/YARN-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli reassigned YARN-953: Assignee: Vinod Kumar Vavilapalli (was: Zhijie Shen) > [YARN-321] Change ResourceManager to use HistoryStorage to log history data > --- > > Key: YARN-953 > URL: https://issues.apache.org/jira/browse/YARN-953 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > Attachments: YARN-953.1.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-903) DistributedShell throwing Errors in logs after successfull completion
[ https://issues.apache.org/jira/browse/YARN-903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725772#comment-13725772 ] Omkar Vinit Joshi commented on YARN-903: bq. NMContext is all about shared state. You are sharing state between ContainerManager and NodeStatusUpdater. We don't keep any shared state outside NMContext. You still having more casting in NodeManager. If you go this route, you will have to add clearTrackedFinishedContainersFromCache() to NodeStatusUpdater. Seems completely odd to have such a method on NodeStatusUpdater. I still prefer NMContext Yes added method clearTrackedFinishedContainersFromCache() to NodeStatusUpdater. bq. Yes, the missing break is the main bug. We should add a very simple unit test that validates this addition and expiry. Only at NodeStatusUpdater unit level is fine enough. yes adding one.. > DistributedShell throwing Errors in logs after successfull completion > - > > Key: YARN-903 > URL: https://issues.apache.org/jira/browse/YARN-903 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell >Affects Versions: 2.0.4-alpha > Environment: Ununtu 11.10 >Reporter: Abhishek Kapoor >Assignee: Omkar Vinit Joshi > Attachments: AppMaster.stderr, YARN-903-20130717.1.patch, > YARN-903-20130718.1.patch, YARN-903-20130723.patch, > YARN-903-20130729.1.patch, YARN-903-20130730.1.patch, > yarn-sunny-nodemanager-sunny-Inspiron.log > > > I have tried running DistributedShell and also used ApplicationMaster of the > same for my test. > The application is successfully running through logging some errors which > would be useful to fix. > Below are the logs from NodeManager and ApplicationMasterode > Log Snippet for NodeManager > = > 2013-07-07 13:39:18,787 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Connecting > to ResourceManager at localhost/127.0.0.1:9990. current no. of attempts is 1 > 2013-07-07 13:39:19,050 INFO > org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: > Rolling master-key for container-tokens, got key with id -325382586 > 2013-07-07 13:39:19,052 INFO > org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: > Rolling master-key for nm-tokens, got key with id :1005046570 > 2013-07-07 13:39:19,053 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered > with ResourceManager as sunny-Inspiron:9993 with total resource of > > 2013-07-07 13:39:19,053 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Notifying > ContainerManager to unblock new container-requests > 2013-07-07 13:39:35,256 INFO SecurityLogger.org.apache.hadoop.ipc.Server: > Auth successful for appattempt_1373184544832_0001_01 (auth:SIMPLE) > 2013-07-07 13:39:35,492 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Start request for container_1373184544832_0001_01_01 by user sunny > 2013-07-07 13:39:35,507 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Creating a new application reference for app application_1373184544832_0001 > 2013-07-07 13:39:35,511 INFO > org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=sunny > IP=127.0.0.1OPERATION=Start Container Request > TARGET=ContainerManageImpl RESULT=SUCCESS > APPID=application_1373184544832_0001 > CONTAINERID=container_1373184544832_0001_01_01 > 2013-07-07 13:39:35,511 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Application application_1373184544832_0001 transitioned from NEW to INITING > 2013-07-07 13:39:35,512 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Adding container_1373184544832_0001_01_01 to application > application_1373184544832_0001 > 2013-07-07 13:39:35,518 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Application application_1373184544832_0001 transitioned from INITING to > RUNNING > 2013-07-07 13:39:35,528 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: > Container container_1373184544832_0001_01_01 transitioned from NEW to > LOCALIZING > 2013-07-07 13:39:35,540 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: > Resource hdfs://localhost:9000/application/test.jar transitioned from INIT > to DOWNLOADING > 2013-07-07 13:39:35,540 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Created localizer for container_1373184544832
[jira] [Updated] (YARN-758) TestRMRestart should use MockNMs with multiple cores
[ https://issues.apache.org/jira/browse/YARN-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-758: -- Summary: TestRMRestart should use MockNMs with multiple cores (was: Fair scheduler has some bug that causes TestRMRestart to fail) > TestRMRestart should use MockNMs with multiple cores > > > Key: YARN-758 > URL: https://issues.apache.org/jira/browse/YARN-758 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Karthik Kambatla > Attachments: yarn-758-1.patch > > > YARN-757 got fixed by changing the scheduler from Fair to default (which is > capacity). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1001) YARN should provide per application-type and state statistics
[ https://issues.apache.org/jira/browse/YARN-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725751#comment-13725751 ] Srimanth Gunturi commented on YARN-1001: [~zjshen], yes we could filter by app-type + state and count. But this is very inefficient as the number of applications could be large with potentially paging involved. We dont want to read the huge outputs just to get the counts. It would be helpful if state-counts per app-type be provided. > YARN should provide per application-type and state statistics > - > > Key: YARN-1001 > URL: https://issues.apache.org/jira/browse/YARN-1001 > Project: Hadoop YARN > Issue Type: Task > Components: api >Affects Versions: 2.1.0-beta >Reporter: Srimanth Gunturi > > In Ambari we plan to show for MR2 the number of applications finished, > running, waiting, etc. It would be efficient if YARN could provide per > application-type and state aggregated counts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-808) ApplicationReport does not clearly tell that the attempt is running or not
[ https://issues.apache.org/jira/browse/YARN-808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-808: --- Attachment: YARN-808.1.patch > ApplicationReport does not clearly tell that the attempt is running or not > -- > > Key: YARN-808 > URL: https://issues.apache.org/jira/browse/YARN-808 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Xuan Gong > Attachments: YARN-808.1.patch > > > When an app attempt fails and is being retried, ApplicationReport immediately > gives the new attemptId and non-null values of host etc. There is no way for > clients to know that the attempt is running other than connecting to it and > timing out on invalid host. Solution would be to expose the attempt state or > return a null value for host instead of "N/A" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-990) YARN REST api needs filtering capability
[ https://issues.apache.org/jira/browse/YARN-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725748#comment-13725748 ] Zhijie Shen commented on YARN-990: -- Checked the latest trunk, RMWebServices provides the filter of both applicationType*s* and state already. Are you looking for more filters or demanding that state can accept multiple values? > YARN REST api needs filtering capability > > > Key: YARN-990 > URL: https://issues.apache.org/jira/browse/YARN-990 > Project: Hadoop YARN > Issue Type: Bug > Components: api >Affects Versions: 2.1.0-beta >Reporter: Srimanth Gunturi > > We wanted to find the MR2 apps which were running/finished/etc. There was no > filtering capability of the /apps endpoint. > [http://dev01:8088/ws/v1/cluster/apps?applicationType=MAPREDUCE&state=RUNNING] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1001) YARN should provide per application-type and state statistics
[ https://issues.apache.org/jira/browse/YARN-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725740#comment-13725740 ] Zhijie Shen commented on YARN-1001: --- [~srimanth.gunturi], I'm not fully sure it will meet Ambari's requirement, but it is worth mentioning that actually getApplication() now can get the applications of a certain type by supplying the type name. Then the count can be easily concluded from the response. > YARN should provide per application-type and state statistics > - > > Key: YARN-1001 > URL: https://issues.apache.org/jira/browse/YARN-1001 > Project: Hadoop YARN > Issue Type: Task > Components: api >Affects Versions: 2.1.0-beta >Reporter: Srimanth Gunturi > > In Ambari we plan to show for MR2 the number of applications finished, > running, waiting, etc. It would be efficient if YARN could provide per > application-type and state aggregated counts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-770) NPE NodeStatusUpdaterImpl
[ https://issues.apache.org/jira/browse/YARN-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725739#comment-13725739 ] Xuan Gong commented on YARN-770: I run several test which used a miniYarn cluster, such as testAMRMClient, testNMClient. But I did not see that. > NPE NodeStatusUpdaterImpl > - > > Key: YARN-770 > URL: https://issues.apache.org/jira/browse/YARN-770 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Priority: Minor > > A mini yarn cluster based test just failed -NPE in the logs in > {{NodeStatusUpdaterImpl}}, which is probably a symptom of the problem, not > the cause -network trouble more likely there- but it shows there's some extra > checking for null responses. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-76) killApplication doesn't fully kill application master on Mac OS
[ https://issues.apache.org/jira/browse/YARN-76?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725719#comment-13725719 ] Xuan Gong commented on YARN-76: --- [~bowang] I run the sleep task, and kill this application. Could you tell me which command you are running to find out that the application master is not killed. Because I run ps -A before and after kill command, and I do not find the application master is alive. > killApplication doesn't fully kill application master on Mac OS > --- > > Key: YARN-76 > URL: https://issues.apache.org/jira/browse/YARN-76 > Project: Hadoop YARN > Issue Type: Bug > Environment: Failed on MacOS. OK on Linux >Reporter: Bo Wang > > When client sends a ClientRMProtocol#killApplication to RM, the corresponding > AM is supposed to be killed. However, on Mac OS, the AM is still alive (w/o > any interruption). > I figured out part of the reason after some debugging. NM starts a AM with > command like "/bin/bash -c /path/to/java SampleAM". This command is executed > in a process (say with PID 0001), which starts another Java process (say with > PID 0002). When NM kills the AM, it send SIGTERM and then SIGKILL to the bash > process (PID 0001). In Linux, the death of the bash process (PID 0001) will > trigger the kill of the Java process (PID 0002). However, in Mac OS, only the > bash process is killed. The Java process is in the wild since then. > Note: on Mac OS, DefaultContainerExecutor is used rather than > LinuxContainerExecutor. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-1006) Nodes list web page on the RM web UI is broken
Jian He created YARN-1006: - Summary: Nodes list web page on the RM web UI is broken Key: YARN-1006 URL: https://issues.apache.org/jira/browse/YARN-1006 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He The nodes web page which list all the connected nodes of the cluster is broken. 1. The page is not showing in correct format/style. 2. If we restart the NM, the node list is not refreshed, but just add the new started NM to the list. The old NMs information still remain. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-957) Capacity Scheduler tries to reserve the memory more than what node manager reports.
[ https://issues.apache.org/jira/browse/YARN-957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725693#comment-13725693 ] Hadoop QA commented on YARN-957: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12595262/YARN-957-20130731.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1628//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1628//console This message is automatically generated. > Capacity Scheduler tries to reserve the memory more than what node manager > reports. > --- > > Key: YARN-957 > URL: https://issues.apache.org/jira/browse/YARN-957 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Attachments: YARN-957-20130730.1.patch, YARN-957-20130730.2.patch, > YARN-957-20130730.3.patch, YARN-957-20130731.1.patch > > > I have 2 node managers. > * one with 1024 MB memory.(nm1) > * second with 2048 MB memory.(nm2) > I am submitting simple map reduce application with 1 mapper and one reducer > with 1024mb each. The steps to reproduce this are > * stop nm2 with 2048MB memory.( This I am doing to make sure that this node's > heartbeat doesn't reach RM first). > * now submit application. As soon as it receives first node's (nm1) heartbeat > it will try to reserve memory for AM-container (2048MB). However it has only > 1024MB of memory. > * now start nm2 with 2048 MB memory. > It hangs forever... Ideally this has two potential issues. > * It should not try to reserve memory on a node manager which is never going > to give requested memory. i.e. Current max capability of node manager is > 1024MB but 2048MB is reserved on it. But it still does that. > * Say 2048MB is reserved on nm1 but nm2 comes back with 2048MB available > memory. In this case if the original request was made without any locality > then scheduler should unreserve memory on nm1 and allocate requested 2048MB > container on nm2. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-758) Fair scheduler has some bug that causes TestRMRestart to fail
[ https://issues.apache.org/jira/browse/YARN-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725683#comment-13725683 ] Hadoop QA commented on YARN-758: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12595260/yarn-758-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1627//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1627//console This message is automatically generated. > Fair scheduler has some bug that causes TestRMRestart to fail > - > > Key: YARN-758 > URL: https://issues.apache.org/jira/browse/YARN-758 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Karthik Kambatla > Attachments: yarn-758-1.patch > > > YARN-757 got fixed by changing the scheduler from Fair to default (which is > capacity). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-957) Capacity Scheduler tries to reserve the memory more than what node manager reports.
[ https://issues.apache.org/jira/browse/YARN-957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-957: --- Attachment: YARN-957-20130731.1.patch > Capacity Scheduler tries to reserve the memory more than what node manager > reports. > --- > > Key: YARN-957 > URL: https://issues.apache.org/jira/browse/YARN-957 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Attachments: YARN-957-20130730.1.patch, YARN-957-20130730.2.patch, > YARN-957-20130730.3.patch, YARN-957-20130731.1.patch > > > I have 2 node managers. > * one with 1024 MB memory.(nm1) > * second with 2048 MB memory.(nm2) > I am submitting simple map reduce application with 1 mapper and one reducer > with 1024mb each. The steps to reproduce this are > * stop nm2 with 2048MB memory.( This I am doing to make sure that this node's > heartbeat doesn't reach RM first). > * now submit application. As soon as it receives first node's (nm1) heartbeat > it will try to reserve memory for AM-container (2048MB). However it has only > 1024MB of memory. > * now start nm2 with 2048 MB memory. > It hangs forever... Ideally this has two potential issues. > * It should not try to reserve memory on a node manager which is never going > to give requested memory. i.e. Current max capability of node manager is > 1024MB but 2048MB is reserved on it. But it still does that. > * Say 2048MB is reserved on nm1 but nm2 comes back with 2048MB available > memory. In this case if the original request was made without any locality > then scheduler should unreserve memory on nm1 and allocate requested 2048MB > container on nm2. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-758) Fair scheduler has some bug that causes TestRMRestart to fail
[ https://issues.apache.org/jira/browse/YARN-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-758: -- Attachment: yarn-758-1.patch Added a constructor to MockNM that takes vcores as well, and updated TestRMRestart to use that. > Fair scheduler has some bug that causes TestRMRestart to fail > - > > Key: YARN-758 > URL: https://issues.apache.org/jira/browse/YARN-758 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Karthik Kambatla > Attachments: yarn-758-1.patch > > > YARN-757 got fixed by changing the scheduler from Fair to default (which is > capacity). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-972) Allow requests and scheduling for fractional virtual cores
[ https://issues.apache.org/jira/browse/YARN-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725652#comment-13725652 ] Arun C Murthy commented on YARN-972: I'll repeat what I said in YARN-2, this is a bad idea for the same reasons I said there: # Fractional arithmetic is expensive - particularly in java; again, see MAPREDUCE-1354. Currently CS can fill up decent sized clusters in <100ms. I'm willing to bet this will be more expensive - I'd like to see benchmarks before we argue against it. Also instead of multiplying by 1000 etc., we can increase #vcores. # vcore doesn't mean it isn't predictable across clusters - I'd support an enhancement which puts in a specific value for a vcore ala EC2 (2009 xeon etc., whatever we want to pick). > Allow requests and scheduling for fractional virtual cores > -- > > Key: YARN-972 > URL: https://issues.apache.org/jira/browse/YARN-972 > Project: Hadoop YARN > Issue Type: Improvement > Components: api, scheduler >Affects Versions: 2.0.5-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > > As this idea sparked a fair amount of discussion on YARN-2, I'd like to go > deeper into the reasoning. > Currently the virtual core abstraction hides two orthogonal goals. The first > is that a cluster might have heterogeneous hardware and that the processing > power of different makes of cores can vary wildly. The second is that a > different (combinations of) workloads can require different levels of > granularity. E.g. one admin might want every task on their cluster to use at > least a core, while another might want applications to be able to request > quarters of cores. The former would configure a single vcore per core. The > latter would configure four vcores per core. > I don't think that the abstraction is a good way of handling the second goal. > Having a virtual cores refer to different magnitudes of processing power on > different clusters will make the difficult problem of deciding how many cores > to request for a job even more confusing. > Can we not handle this with dynamic oversubscription? > Dynamic oversubscription, i.e. adjusting the number of cores offered by a > machine based on measured CPU-consumption, should work as a complement to > fine-granularity scheduling. Dynamic oversubscription is never going to be > perfect, as the amount of CPU a process consumes can vary widely over its > lifetime. A task that first loads a bunch of data over the network and then > performs complex computations on it will suffer if additional CPU-heavy tasks > are scheduled on the same node because its initial CPU-utilization was low. > To guard against this, we will need to be conservative with how we > dynamically oversubscribe. If a user wants to explicitly hint to the > scheduler that their task will not use much CPU, the scheduler should be able > to take this into account. > On YARN-2, there are concerns that including floating point arithmetic in the > scheduler will slow it down. I question this assumption, and it is perhaps > worth debating, but I think we can sidestep the issue by multiplying > CPU-quantities inside the scheduler by a decently sized number like 1000 and > keep doing the computations on integers. > The relevant APIs are marked as evolving, so there's no need for the change > to delay 2.1.0-beta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-985) Nodemanager should log where a resource was localized
[ https://issues.apache.org/jira/browse/YARN-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725610#comment-13725610 ] Ravi Prakash commented on YARN-985: --- Hi Omkar! bq. yeah this is a small thing but would have preferred logging in successful state transition Is there a reason for your preference? The reason I put it there was because this way I am piggybacking on an already existing Log statement. Granted it shouldn't be much of a performance bottleneck (unless its a fat node launching a lot of containers which are localizing a lot of files), but there's a tangible reason for why I chose to do it that way. Even logging has performance implications as we saw in HDFS-4080. bq. You can also have a debug log when file gets removed from cache to see if it is deleted or not. LocalResourcesTrackerImpl.java I'm sorry I missed your suggestion in the original post. That's a good suggestion. I'll update the patch to Log.info() when it removes the LocalizedResource. > Nodemanager should log where a resource was localized > - > > Key: YARN-985 > URL: https://issues.apache.org/jira/browse/YARN-985 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.0.0, 2.0.4-alpha, 0.23.9 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Attachments: YARN-985.patch > > > When a resource is localized, we should log WHERE on the local disk it was > localized. This helps in debugging afterwards (e.g. if the disk was to go > bad). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1004) yarn.scheduler.minimum|maximum|increment-allocation-mb should have scheduler
[ https://issues.apache.org/jira/browse/YARN-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725556#comment-13725556 ] Hitesh Shah commented on YARN-1004: --- [~vinodkv] min allocation is no longer visible to an application. An application asks for a certain resource size and will be given either the exact size or something bigger. How much bigger is no longer informed to the application. > yarn.scheduler.minimum|maximum|increment-allocation-mb should have scheduler > > > Key: YARN-1004 > URL: https://issues.apache.org/jira/browse/YARN-1004 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza > > As yarn.scheduler.minimum-allocation-mb is now a scheduler-specific > configuration, and functions differently for the Fair and Capacity > schedulers, it would be less confusing for the config names to include the > scheduler names, i.e. yarn.scheduler.fair.minimum-allocation-mb, > yarn.scheduler.capacity.minimum-allocation-mb, and > yarn.scheduler.fifo.minimum-allocation-mb. > The same goes for yarn.scheduler.increment-allocation-mb, which only exists > for the Fair Scheduler, and yarn.scheduler.maximum-allocation-mb, for > consistency. > If we wish to preserve backwards compatibility, we can deprecate the old > configs to the new ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-573) Shared data structures in Public Localizer and Private Localizer are not Thread safe.
[ https://issues.apache.org/jira/browse/YARN-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725571#comment-13725571 ] Sangjin Lee commented on YARN-573: -- +1 on wrapping the list with Collections.synchronizedList(). That would make the intent bit clearer. Then you can drop synchronization on the add() call (you'd still need to use explicit synchronization for the iteration as [~jlowe] pointed out). An alternative (which may be slightly more concurrent) is to use ConcurrentLinkedQueue. You would drop back from List to Queue, but that's all you need anyway. Besides, you would no longer need to use synchronization. > Shared data structures in Public Localizer and Private Localizer are not > Thread safe. > - > > Key: YARN-573 > URL: https://issues.apache.org/jira/browse/YARN-573 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi >Priority: Critical > Attachments: YARN-573-20130730.1.patch > > > PublicLocalizer > 1) pending accessed by addResource (part of event handling) and run method > (as a part of PublicLocalizer.run() ). > PrivateLocalizer > 1) pending accessed by addResource (part of event handling) and > findNextResource (i.remove()). Also update method should be fixed. It too is > sharing pending list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-573) Shared data structures in Public Localizer and Private Localizer are not Thread safe.
[ https://issues.apache.org/jira/browse/YARN-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725544#comment-13725544 ] Jason Lowe commented on YARN-573: - bq. I thought about it earlier but we are using iterator internally and we are modifying list using that iterator which won't be thread safe. Let me know if we should use Collections.synchronizedList or should synchronize on list? It's OK to iterate over a SynchronizedList as long as one explicitly synchronizes the list while iterating. This is called out in the javadocs for SynchronizedList. Synchronizing on the list will effectively block all other threads attempting to access the list until the iteration completes, because SynchronizedList methods end up using {{this}} as a mutex. bq. Yes you are right we should change the constructor to use ConcurrentMap. I will fix it together with above question/comment. I was not so much thinking the constructor should take a ConcurrentMap so much as thinking that particular constructor should simply be removed. It's not called by anything else other than the simpler constructor form, and we can just have that constructor create the ConcurrentMap directly when it initializes the {{pending}} field. > Shared data structures in Public Localizer and Private Localizer are not > Thread safe. > - > > Key: YARN-573 > URL: https://issues.apache.org/jira/browse/YARN-573 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi >Priority: Critical > Attachments: YARN-573-20130730.1.patch > > > PublicLocalizer > 1) pending accessed by addResource (part of event handling) and run method > (as a part of PublicLocalizer.run() ). > PrivateLocalizer > 1) pending accessed by addResource (part of event handling) and > findNextResource (i.remove()). Also update method should be fixed. It too is > sharing pending list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1004) yarn.scheduler.minimum|maximum|increment-allocation-mb should have scheduler
[ https://issues.apache.org/jira/browse/YARN-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725530#comment-13725530 ] Vinod Kumar Vavilapalli commented on YARN-1004: --- bq. maximum-allocation is just for consistency. My thought is that it should be scheduler-specific because it's up to the scheduler to honor the config. Someone could write a new scheduler and not handle it. I haven't been following the FifoScheduler changes but this is wrong. All schedulers should honor this. Otherwise app-writers won't know what can be honoured and what cannot be. Seems like it is already agreed that min is specific to scheduler. Even there I'd make the same argument. > yarn.scheduler.minimum|maximum|increment-allocation-mb should have scheduler > > > Key: YARN-1004 > URL: https://issues.apache.org/jira/browse/YARN-1004 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza > > As yarn.scheduler.minimum-allocation-mb is now a scheduler-specific > configuration, and functions differently for the Fair and Capacity > schedulers, it would be less confusing for the config names to include the > scheduler names, i.e. yarn.scheduler.fair.minimum-allocation-mb, > yarn.scheduler.capacity.minimum-allocation-mb, and > yarn.scheduler.fifo.minimum-allocation-mb. > The same goes for yarn.scheduler.increment-allocation-mb, which only exists > for the Fair Scheduler, and yarn.scheduler.maximum-allocation-mb, for > consistency. > If we wish to preserve backwards compatibility, we can deprecate the old > configs to the new ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-972) Allow requests and scheduling for fractional virtual cores
[ https://issues.apache.org/jira/browse/YARN-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725528#comment-13725528 ] Allen Wittenauer commented on YARN-972: --- bq. Nodes in probably the majority of clusters are configured with more slots than cores. This is sensible because many types of task do a lot of IO and do not even saturate half of a single core. I disagree. It isn't a sensible thing to do at all unless it *also* schedules based upon IO characteristics in addition to processor needs. The system eventually ends up in a death spiral: P1: "We need more processes on this machine because the load isn't high!" P2: "OK! I've put more of our IO intensive processes on this machine!" P1: "Weird! The CPUs are now spending more time in IO wait! Let's add more processes since we have more CPU to get it higher!" ... I posit that the reason why (at least in Hadoop 1.x systems) there are more tasks per cores is simple: the jobs are crap. They are spending more time launching JVMs and getting scheduled than they are actually executing code. It gives the illusion that Hadoop isn't scheduling efficiently. Unless one recognizes that there is a tipping point in parallelism, most users are going to keep increasing it in blind faith that "more tasks = faster always". Also, yes, I want YARN-796, but I don't think that's an orthogonal discussion. My opinion is that they are different facets of the same discussion: how do we properly schedule in a mixed load environment. It's very hard to get it 100% efficient for all cases. Some folks are going to have to suffer. If I had to pick, let it be the folks with workloads that are either terribly written or sleep a lot and don't require a lot of processor when they do wake up. > Allow requests and scheduling for fractional virtual cores > -- > > Key: YARN-972 > URL: https://issues.apache.org/jira/browse/YARN-972 > Project: Hadoop YARN > Issue Type: Improvement > Components: api, scheduler >Affects Versions: 2.0.5-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > > As this idea sparked a fair amount of discussion on YARN-2, I'd like to go > deeper into the reasoning. > Currently the virtual core abstraction hides two orthogonal goals. The first > is that a cluster might have heterogeneous hardware and that the processing > power of different makes of cores can vary wildly. The second is that a > different (combinations of) workloads can require different levels of > granularity. E.g. one admin might want every task on their cluster to use at > least a core, while another might want applications to be able to request > quarters of cores. The former would configure a single vcore per core. The > latter would configure four vcores per core. > I don't think that the abstraction is a good way of handling the second goal. > Having a virtual cores refer to different magnitudes of processing power on > different clusters will make the difficult problem of deciding how many cores > to request for a job even more confusing. > Can we not handle this with dynamic oversubscription? > Dynamic oversubscription, i.e. adjusting the number of cores offered by a > machine based on measured CPU-consumption, should work as a complement to > fine-granularity scheduling. Dynamic oversubscription is never going to be > perfect, as the amount of CPU a process consumes can vary widely over its > lifetime. A task that first loads a bunch of data over the network and then > performs complex computations on it will suffer if additional CPU-heavy tasks > are scheduled on the same node because its initial CPU-utilization was low. > To guard against this, we will need to be conservative with how we > dynamically oversubscribe. If a user wants to explicitly hint to the > scheduler that their task will not use much CPU, the scheduler should be able > to take this into account. > On YARN-2, there are concerns that including floating point arithmetic in the > scheduler will slow it down. I question this assumption, and it is perhaps > worth debating, but I think we can sidestep the issue by multiplying > CPU-quantities inside the scheduler by a decently sized number like 1000 and > keep doing the computations on integers. > The relevant APIs are marked as evolving, so there's no need for the change > to delay 2.1.0-beta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1004) yarn.scheduler.minimum|maximum|increment-allocation-mb should have scheduler
[ https://issues.apache.org/jira/browse/YARN-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725522#comment-13725522 ] Sandy Ryza commented on YARN-1004: -- [~hitesh], your reasoning makes sense to me. I'll leave the maximum and just update the minimum and increment. > yarn.scheduler.minimum|maximum|increment-allocation-mb should have scheduler > > > Key: YARN-1004 > URL: https://issues.apache.org/jira/browse/YARN-1004 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza > > As yarn.scheduler.minimum-allocation-mb is now a scheduler-specific > configuration, and functions differently for the Fair and Capacity > schedulers, it would be less confusing for the config names to include the > scheduler names, i.e. yarn.scheduler.fair.minimum-allocation-mb, > yarn.scheduler.capacity.minimum-allocation-mb, and > yarn.scheduler.fifo.minimum-allocation-mb. > The same goes for yarn.scheduler.increment-allocation-mb, which only exists > for the Fair Scheduler, and yarn.scheduler.maximum-allocation-mb, for > consistency. > If we wish to preserve backwards compatibility, we can deprecate the old > configs to the new ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-966) The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED
[ https://issues.apache.org/jira/browse/YARN-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725526#comment-13725526 ] Zhijie Shen commented on YARN-966: -- bq. Potentially I don't see when we will in fact start ContainerLaunch#call without its all resources getting downloaded. YARN-906 is such a corner case. bq. This I still see should not be done via NULL check. Proper way is to set boolean flag of ContainerLaunch in the event of KILL synchronously. The original code checks state == LOCALIZED, and throws AssertError when getting the localized resources. I just modified the way to indicate the error, such that the callers of it can more easily handle the error. If you think calling getLocalizedResources() when the container is not at LOCALIZED is not wrong, I'm afraid we're in the different conversation. bq. which is completely misleading.. Indeed this occurred because user killed container not because it failed to localize resources. I don't think the message is misleading. Again, getLocalizedResources() is not allowed to be called when the container is not at LOCALIZED (at least the original code means it). So the message clearly states problem. Please note that killing signal is not the root problem of the thread failure here. If getLocalizedResources() were not called, the thread would still complete without exception. > The thread of ContainerLaunch#call will fail without any signal if > getLocalizedResources() is called when the container is not at LOCALIZED > --- > > Key: YARN-966 > URL: https://issues.apache.org/jira/browse/YARN-966 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Fix For: 2.1.1-beta > > Attachments: YARN-966.1.patch > > > In ContainerImpl.getLocalizedResources(), there's: > {code} > assert ContainerState.LOCALIZED == getContainerState(); // TODO: FIXME!! > {code} > ContainerImpl.getLocalizedResources() is called in ContainerLaunch.call(), > which is scheduled on a separate thread. If the container is not at LOCALIZED > (e.g. it is at KILLING, see YARN-906), an AssertError will be thrown and > fails the thread without notifying NM. Therefore, the container cannot > receive more events, which are supposed to be sent from > ContainerLaunch.call(), and move towards completion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-966) The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED
[ https://issues.apache.org/jira/browse/YARN-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725518#comment-13725518 ] Vinod Kumar Vavilapalli commented on YARN-966: -- bq. Potentially I don't see when we will in fact start ContainerLaunch#call without its all resources getting downloaded. This is the most important point. bq. This I still see should not be done via NULL check. Proper way is to set boolean flag of ContainerLaunch in the event of KILL synchronously. bq. which is completely misleading.. Indeed this occurred because user killed container not because it failed to localize resources. I think we are beating this down to death. Like I said, this error SHOULD NOT happen in practice. I don't know whey the assert was originally put in place. That said, I didn't want to blindly remove it without knowing why it was there to begin with. If ever we run into this in real life, we can fix the message. > The thread of ContainerLaunch#call will fail without any signal if > getLocalizedResources() is called when the container is not at LOCALIZED > --- > > Key: YARN-966 > URL: https://issues.apache.org/jira/browse/YARN-966 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Fix For: 2.1.1-beta > > Attachments: YARN-966.1.patch > > > In ContainerImpl.getLocalizedResources(), there's: > {code} > assert ContainerState.LOCALIZED == getContainerState(); // TODO: FIXME!! > {code} > ContainerImpl.getLocalizedResources() is called in ContainerLaunch.call(), > which is scheduled on a separate thread. If the container is not at LOCALIZED > (e.g. it is at KILLING, see YARN-906), an AssertError will be thrown and > fails the thread without notifying NM. Therefore, the container cannot > receive more events, which are supposed to be sent from > ContainerLaunch.call(), and move towards completion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1004) yarn.scheduler.minimum|maximum|increment-allocation-mb should have scheduler
[ https://issues.apache.org/jira/browse/YARN-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725517#comment-13725517 ] Hitesh Shah commented on YARN-1004: --- [~sandyr] Scheduler-specific configs are fine as long they don't affect the apis and how an app needs to be written. The reason I mentioned max is that max is currently exposed in the api and therefore it requires either to be in the RM-config or an enforced config property of each scheduler impl. The question of max being a scheduler-specific implementation choice of whether to handle it or not seems wrong. Based on the current api, it is a defined contract between an app and YARN that a container greater than max will not be allocated. Having one scheduler enforce that contract and another not enforce means that applications now need to know what scheduler is running and change their code/run-time flow accordingly. That is a huge problem for developers trying to write applications on YARN. > yarn.scheduler.minimum|maximum|increment-allocation-mb should have scheduler > > > Key: YARN-1004 > URL: https://issues.apache.org/jira/browse/YARN-1004 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza > > As yarn.scheduler.minimum-allocation-mb is now a scheduler-specific > configuration, and functions differently for the Fair and Capacity > schedulers, it would be less confusing for the config names to include the > scheduler names, i.e. yarn.scheduler.fair.minimum-allocation-mb, > yarn.scheduler.capacity.minimum-allocation-mb, and > yarn.scheduler.fifo.minimum-allocation-mb. > The same goes for yarn.scheduler.increment-allocation-mb, which only exists > for the Fair Scheduler, and yarn.scheduler.maximum-allocation-mb, for > consistency. > If we wish to preserve backwards compatibility, we can deprecate the old > configs to the new ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-573) Shared data structures in Public Localizer and Private Localizer are not Thread safe.
[ https://issues.apache.org/jira/browse/YARN-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-573: --- Description: PublicLocalizer 1) pending accessed by addResource (part of event handling) and run method (as a part of PublicLocalizer.run() ). PrivateLocalizer 1) pending accessed by addResource (part of event handling) and findNextResource (i.remove()). Also update method should be fixed. It too is sharing pending list. was: PublicLocalizer 1) pending accessed by addResource (part of event handling) and run method (as a part of PublicLocalizer.run() ). PrivateLocalizer 1) pending accessed by addResource (part of event handling) and findNextResource (i.remove()). > Shared data structures in Public Localizer and Private Localizer are not > Thread safe. > - > > Key: YARN-573 > URL: https://issues.apache.org/jira/browse/YARN-573 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi >Priority: Critical > Attachments: YARN-573-20130730.1.patch > > > PublicLocalizer > 1) pending accessed by addResource (part of event handling) and run method > (as a part of PublicLocalizer.run() ). > PrivateLocalizer > 1) pending accessed by addResource (part of event handling) and > findNextResource (i.remove()). Also update method should be fixed. It too is > sharing pending list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-955) [YARN-321] History Service should create the RPC server and wire it to HistoryStorage
[ https://issues.apache.org/jira/browse/YARN-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725506#comment-13725506 ] Mayank Bansal commented on YARN-955: Taking it over > [YARN-321] History Service should create the RPC server and wire it to > HistoryStorage > - > > Key: YARN-955 > URL: https://issues.apache.org/jira/browse/YARN-955 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-955) [YARN-321] History Service should create the RPC server and wire it to HistoryStorage
[ https://issues.apache.org/jira/browse/YARN-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal reassigned YARN-955: -- Assignee: Mayank Bansal (was: Vinod Kumar Vavilapalli) > [YARN-321] History Service should create the RPC server and wire it to > HistoryStorage > - > > Key: YARN-955 > URL: https://issues.apache.org/jira/browse/YARN-955 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-987) Implementation of *HistoryData classes to convert to *Report Objects
[ https://issues.apache.org/jira/browse/YARN-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated YARN-987: --- Summary: Implementation of *HistoryData classes to convert to *Report Objects (was: Read Interface Implementation of HistoryStorage for AHS) > Implementation of *HistoryData classes to convert to *Report Objects > > > Key: YARN-987 > URL: https://issues.apache.org/jira/browse/YARN-987 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Mayank Bansal >Assignee: Mayank Bansal > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-573) Shared data structures in Public Localizer and Private Localizer are not Thread safe.
[ https://issues.apache.org/jira/browse/YARN-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725505#comment-13725505 ] Omkar Vinit Joshi commented on YARN-573: [~jlowe] Thanks for reviewing.. bq. LocalizerRunner.pending is accessed without synchronization in the update() method. Maybe it would be simpler to just use a SynchronizedList wrapper? That would make it a bit more robust in light of maintenance changes in the future as well. Yeah my bad.. missed update call... that should be fixed...regarding using synchronized list; I thought about it earlier but we are using iterator internally and we are modifying list using that iterator which won't be thread safe. Let me know if we should use Collections.synchronizedList or should synchronize on list? Correct me if I am wrong anywhere. bq. Nit: The PublicLocalizer constructor that takes a Map isn't really used, and as we know pending can't be just any Map for it to work properly. I'd be tempted to remove that constructor, but it's not a necessary change. Yes you are right we should change the constructor to use ConcurrentMap. I will fix it together with above question/comment. > Shared data structures in Public Localizer and Private Localizer are not > Thread safe. > - > > Key: YARN-573 > URL: https://issues.apache.org/jira/browse/YARN-573 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi >Priority: Critical > Attachments: YARN-573-20130730.1.patch > > > PublicLocalizer > 1) pending accessed by addResource (part of event handling) and run method > (as a part of PublicLocalizer.run() ). > PrivateLocalizer > 1) pending accessed by addResource (part of event handling) and > findNextResource (i.remove()). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-966) The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED
[ https://issues.apache.org/jira/browse/YARN-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725489#comment-13725489 ] Omkar Vinit Joshi commented on YARN-966: So if user say kills the container what error we will see is {code} +RPCUtil.getRemoteException( +"Unable to get local resources when Container " + containerID + +" is at " + container.getContainerState()); {code} which is completely misleading.. Indeed this occurred because user killed container not because it failed to localize resources. > The thread of ContainerLaunch#call will fail without any signal if > getLocalizedResources() is called when the container is not at LOCALIZED > --- > > Key: YARN-966 > URL: https://issues.apache.org/jira/browse/YARN-966 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Fix For: 2.1.1-beta > > Attachments: YARN-966.1.patch > > > In ContainerImpl.getLocalizedResources(), there's: > {code} > assert ContainerState.LOCALIZED == getContainerState(); // TODO: FIXME!! > {code} > ContainerImpl.getLocalizedResources() is called in ContainerLaunch.call(), > which is scheduled on a separate thread. If the container is not at LOCALIZED > (e.g. it is at KILLING, see YARN-906), an AssertError will be thrown and > fails the thread without notifying NM. Therefore, the container cannot > receive more events, which are supposed to be sent from > ContainerLaunch.call(), and move towards completion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-966) The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED
[ https://issues.apache.org/jira/browse/YARN-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725483#comment-13725483 ] Omkar Vinit Joshi commented on YARN-966: bq. One more consideration. Empty map can means the case that the container is at LOCALIZED, but actually there's no localized resources. Returning null is to distinguish this case with the case of fetch the localized resources when the container is not at LOCALIZED. This assumption is wrong. State of container has nothing to do with localized resources map. we can call getState and know its state irrespective of this null check. Potentially I don't see when we will in fact start ContainerLaunch#call without its all resources getting downloaded. Its different issue that user may kill the container resulting into a state transition. This I still see should not be done via NULL check. Proper way is to set boolean flag of ContainerLaunch in the event of KILL synchronously. > The thread of ContainerLaunch#call will fail without any signal if > getLocalizedResources() is called when the container is not at LOCALIZED > --- > > Key: YARN-966 > URL: https://issues.apache.org/jira/browse/YARN-966 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Fix For: 2.1.1-beta > > Attachments: YARN-966.1.patch > > > In ContainerImpl.getLocalizedResources(), there's: > {code} > assert ContainerState.LOCALIZED == getContainerState(); // TODO: FIXME!! > {code} > ContainerImpl.getLocalizedResources() is called in ContainerLaunch.call(), > which is scheduled on a separate thread. If the container is not at LOCALIZED > (e.g. it is at KILLING, see YARN-906), an AssertError will be thrown and > fails the thread without notifying NM. Therefore, the container cannot > receive more events, which are supposed to be sent from > ContainerLaunch.call(), and move towards completion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1004) yarn.scheduler.minimum|maximum|increment-allocation-mb should have scheduler
[ https://issues.apache.org/jira/browse/YARN-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725476#comment-13725476 ] Sandy Ryza commented on YARN-1004: -- [~hitesh], maximum-allocation is just for consistency. My thought is that it should be scheduler-specific because it's up to the scheduler to honor the config. Someone could write a new scheduler and not handle it. We have other configs, such as node-locality-threshold, that function the same for the Fair and Capacity schedulers as well. [~bikassaha], I think this is important in that in my experience having these properties that function differently for different schedulers has made explaining resource configuration really difficult. I didn't want to delay the release, but I'll upload a patch today without deprecations and we can decide where to go from there. > yarn.scheduler.minimum|maximum|increment-allocation-mb should have scheduler > > > Key: YARN-1004 > URL: https://issues.apache.org/jira/browse/YARN-1004 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza > > As yarn.scheduler.minimum-allocation-mb is now a scheduler-specific > configuration, and functions differently for the Fair and Capacity > schedulers, it would be less confusing for the config names to include the > scheduler names, i.e. yarn.scheduler.fair.minimum-allocation-mb, > yarn.scheduler.capacity.minimum-allocation-mb, and > yarn.scheduler.fifo.minimum-allocation-mb. > The same goes for yarn.scheduler.increment-allocation-mb, which only exists > for the Fair Scheduler, and yarn.scheduler.maximum-allocation-mb, for > consistency. > If we wish to preserve backwards compatibility, we can deprecate the old > configs to the new ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-602) NodeManager should mandatorily set some Environment variables into every containers that it launches
[ https://issues.apache.org/jira/browse/YARN-602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725475#comment-13725475 ] Vinod Kumar Vavilapalli commented on YARN-602: -- bq. How can I fix putting env in windows case? Is it elevant to Environment.USER(USERNAME in Windows)? That's correct. Environment.USER automatically resolves correctly depending on the OS. > NodeManager should mandatorily set some Environment variables into every > containers that it launches > > > Key: YARN-602 > URL: https://issues.apache.org/jira/browse/YARN-602 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Xuan Gong >Assignee: Kenji Kikushima > Attachments: YARN-602.patch > > > NodeManager should mandatorily set some Environment variables into every > containers that it launches, such as Environment.user, Environment.pwd. If > both users and NodeManager set those variables, the value set by NM should be > used -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-972) Allow requests and scheduling for fractional virtual cores
[ https://issues.apache.org/jira/browse/YARN-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725459#comment-13725459 ] Sandy Ryza commented on YARN-972: - [~ste...@apache.org], bq. Those caches are the key to performance, and if you are trying to overburden the cores with work then its the cache miss penalty that kills the jobs. This is routinely done on clusters already. Nodes in probably the majority of clusters are configured with more slots than cores. This is sensible because many types of task do a lot of IO and do not even saturate half of a single core. bq. Optimising for todays 4-8 cores is a premature optimisation. In what way are we optimising for 4-8 cores? > Allow requests and scheduling for fractional virtual cores > -- > > Key: YARN-972 > URL: https://issues.apache.org/jira/browse/YARN-972 > Project: Hadoop YARN > Issue Type: Improvement > Components: api, scheduler >Affects Versions: 2.0.5-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > > As this idea sparked a fair amount of discussion on YARN-2, I'd like to go > deeper into the reasoning. > Currently the virtual core abstraction hides two orthogonal goals. The first > is that a cluster might have heterogeneous hardware and that the processing > power of different makes of cores can vary wildly. The second is that a > different (combinations of) workloads can require different levels of > granularity. E.g. one admin might want every task on their cluster to use at > least a core, while another might want applications to be able to request > quarters of cores. The former would configure a single vcore per core. The > latter would configure four vcores per core. > I don't think that the abstraction is a good way of handling the second goal. > Having a virtual cores refer to different magnitudes of processing power on > different clusters will make the difficult problem of deciding how many cores > to request for a job even more confusing. > Can we not handle this with dynamic oversubscription? > Dynamic oversubscription, i.e. adjusting the number of cores offered by a > machine based on measured CPU-consumption, should work as a complement to > fine-granularity scheduling. Dynamic oversubscription is never going to be > perfect, as the amount of CPU a process consumes can vary widely over its > lifetime. A task that first loads a bunch of data over the network and then > performs complex computations on it will suffer if additional CPU-heavy tasks > are scheduled on the same node because its initial CPU-utilization was low. > To guard against this, we will need to be conservative with how we > dynamically oversubscribe. If a user wants to explicitly hint to the > scheduler that their task will not use much CPU, the scheduler should be able > to take this into account. > On YARN-2, there are concerns that including floating point arithmetic in the > scheduler will slow it down. I question this assumption, and it is perhaps > worth debating, but I think we can sidestep the issue by multiplying > CPU-quantities inside the scheduler by a decently sized number like 1000 and > keep doing the computations on integers. > The relevant APIs are marked as evolving, so there's no need for the change > to delay 2.1.0-beta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-107) ClientRMService.forceKillApplication() should handle the non-RUNNING applications properly
[ https://issues.apache.org/jira/browse/YARN-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725429#comment-13725429 ] Hadoop QA commented on YARN-107: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12595207/YARN-107.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1625//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1625//console This message is automatically generated. > ClientRMService.forceKillApplication() should handle the non-RUNNING > applications properly > -- > > Key: YARN-107 > URL: https://issues.apache.org/jira/browse/YARN-107 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.0-alpha >Reporter: Devaraj K >Assignee: Xuan Gong > Attachments: YARN-107.1.patch, YARN-107.2.patch, YARN-107.3.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-972) Allow requests and scheduling for fractional virtual cores
[ https://issues.apache.org/jira/browse/YARN-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725427#comment-13725427 ] Alejandro Abdelnur commented on YARN-972: - Allen, what you want seems to be YARN-796. > Allow requests and scheduling for fractional virtual cores > -- > > Key: YARN-972 > URL: https://issues.apache.org/jira/browse/YARN-972 > Project: Hadoop YARN > Issue Type: Improvement > Components: api, scheduler >Affects Versions: 2.0.5-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > > As this idea sparked a fair amount of discussion on YARN-2, I'd like to go > deeper into the reasoning. > Currently the virtual core abstraction hides two orthogonal goals. The first > is that a cluster might have heterogeneous hardware and that the processing > power of different makes of cores can vary wildly. The second is that a > different (combinations of) workloads can require different levels of > granularity. E.g. one admin might want every task on their cluster to use at > least a core, while another might want applications to be able to request > quarters of cores. The former would configure a single vcore per core. The > latter would configure four vcores per core. > I don't think that the abstraction is a good way of handling the second goal. > Having a virtual cores refer to different magnitudes of processing power on > different clusters will make the difficult problem of deciding how many cores > to request for a job even more confusing. > Can we not handle this with dynamic oversubscription? > Dynamic oversubscription, i.e. adjusting the number of cores offered by a > machine based on measured CPU-consumption, should work as a complement to > fine-granularity scheduling. Dynamic oversubscription is never going to be > perfect, as the amount of CPU a process consumes can vary widely over its > lifetime. A task that first loads a bunch of data over the network and then > performs complex computations on it will suffer if additional CPU-heavy tasks > are scheduled on the same node because its initial CPU-utilization was low. > To guard against this, we will need to be conservative with how we > dynamically oversubscribe. If a user wants to explicitly hint to the > scheduler that their task will not use much CPU, the scheduler should be able > to take this into account. > On YARN-2, there are concerns that including floating point arithmetic in the > scheduler will slow it down. I question this assumption, and it is perhaps > worth debating, but I think we can sidestep the issue by multiplying > CPU-quantities inside the scheduler by a decently sized number like 1000 and > keep doing the computations on integers. > The relevant APIs are marked as evolving, so there's no need for the change > to delay 2.1.0-beta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-643) WHY appToken is removed both in BaseFinalTransition and AMUnregisteredTransition AND clientToken is removed in FinalTransition and not BaseFinalTransition
[ https://issues.apache.org/jira/browse/YARN-643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725423#comment-13725423 ] Hadoop QA commented on YARN-643: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12595205/YARN-643.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1626//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1626//console This message is automatically generated. > WHY appToken is removed both in BaseFinalTransition and > AMUnregisteredTransition AND clientToken is removed in FinalTransition and > not BaseFinalTransition > -- > > Key: YARN-643 > URL: https://issues.apache.org/jira/browse/YARN-643 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Xuan Gong > Attachments: YARN-643.1.patch, YARN-643.2.patch > > > The jira is tracking why appToken and clientToAMToken is removed separately, > and why they are distributed in different transitions, ideally there may be a > common place where these two tokens can be removed at the same time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-972) Allow requests and scheduling for fractional virtual cores
[ https://issues.apache.org/jira/browse/YARN-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725426#comment-13725426 ] Allen Wittenauer commented on YARN-972: --- One other thing, before I forget... these requests should also be able to be tied to queues. i.e., I should be able to set up a queue that only has to and only allows workloads that require 4GHz processors. Otherwise making this a free-for-all for users is going to turn into "who can request the fastest machines first" death match. By tying this functionality to queues, the ops team have the capability to control who gets the best gear as needed by the business. > Allow requests and scheduling for fractional virtual cores > -- > > Key: YARN-972 > URL: https://issues.apache.org/jira/browse/YARN-972 > Project: Hadoop YARN > Issue Type: Improvement > Components: api, scheduler >Affects Versions: 2.0.5-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > > As this idea sparked a fair amount of discussion on YARN-2, I'd like to go > deeper into the reasoning. > Currently the virtual core abstraction hides two orthogonal goals. The first > is that a cluster might have heterogeneous hardware and that the processing > power of different makes of cores can vary wildly. The second is that a > different (combinations of) workloads can require different levels of > granularity. E.g. one admin might want every task on their cluster to use at > least a core, while another might want applications to be able to request > quarters of cores. The former would configure a single vcore per core. The > latter would configure four vcores per core. > I don't think that the abstraction is a good way of handling the second goal. > Having a virtual cores refer to different magnitudes of processing power on > different clusters will make the difficult problem of deciding how many cores > to request for a job even more confusing. > Can we not handle this with dynamic oversubscription? > Dynamic oversubscription, i.e. adjusting the number of cores offered by a > machine based on measured CPU-consumption, should work as a complement to > fine-granularity scheduling. Dynamic oversubscription is never going to be > perfect, as the amount of CPU a process consumes can vary widely over its > lifetime. A task that first loads a bunch of data over the network and then > performs complex computations on it will suffer if additional CPU-heavy tasks > are scheduled on the same node because its initial CPU-utilization was low. > To guard against this, we will need to be conservative with how we > dynamically oversubscribe. If a user wants to explicitly hint to the > scheduler that their task will not use much CPU, the scheduler should be able > to take this into account. > On YARN-2, there are concerns that including floating point arithmetic in the > scheduler will slow it down. I question this assumption, and it is perhaps > worth debating, but I think we can sidestep the issue by multiplying > CPU-quantities inside the scheduler by a decently sized number like 1000 and > keep doing the computations on integers. > The relevant APIs are marked as evolving, so there's no need for the change > to delay 2.1.0-beta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-972) Allow requests and scheduling for fractional virtual cores
[ https://issues.apache.org/jira/browse/YARN-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725413#comment-13725413 ] Allen Wittenauer commented on YARN-972: --- Do we have an example of a workload that needs 'fractional cores'? Is this workload even appropriate for Hadoop? As one of the probably crazy people who supports systems that do extremely non-MR things at large scales, I'd prefer to see two things implemented: * I need a processor this fast (in GHz) * I need a processor that supports this instruction set But I'd position the GHz question differently than what has been proposed above. If I say my workload needs a 1GHz processor but there is only a 4GHz processor available, then the workflow would get the whole 4GHz processor. If another workload comes in that needs a 4GHz processor but only a 2GHz processor is available, it needs to wait. Treating speed as fractions gets into another problem: 2x2GHz != 4GHz. Just as having 1/4 of 4 different cores != 1 core. Throw cpu sets into the mix and we've got a major hairball. Also, I'm a bit leery of our usage of the term core here and elsewhere in Hadoop-land. As [~ste...@apache.org] points out, there are impacts on the Lx caches when sharing load. This is also true when talking about most SMT implementations, such as Intel's HyperThreading. This means if we're talking about the Linux (and most other OSes) representation of CPU threads as being equivalent cores, there is *already* a performance hit and users are *already* getting fractional performance just by treating those as "real" cores. > Allow requests and scheduling for fractional virtual cores > -- > > Key: YARN-972 > URL: https://issues.apache.org/jira/browse/YARN-972 > Project: Hadoop YARN > Issue Type: Improvement > Components: api, scheduler >Affects Versions: 2.0.5-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > > As this idea sparked a fair amount of discussion on YARN-2, I'd like to go > deeper into the reasoning. > Currently the virtual core abstraction hides two orthogonal goals. The first > is that a cluster might have heterogeneous hardware and that the processing > power of different makes of cores can vary wildly. The second is that a > different (combinations of) workloads can require different levels of > granularity. E.g. one admin might want every task on their cluster to use at > least a core, while another might want applications to be able to request > quarters of cores. The former would configure a single vcore per core. The > latter would configure four vcores per core. > I don't think that the abstraction is a good way of handling the second goal. > Having a virtual cores refer to different magnitudes of processing power on > different clusters will make the difficult problem of deciding how many cores > to request for a job even more confusing. > Can we not handle this with dynamic oversubscription? > Dynamic oversubscription, i.e. adjusting the number of cores offered by a > machine based on measured CPU-consumption, should work as a complement to > fine-granularity scheduling. Dynamic oversubscription is never going to be > perfect, as the amount of CPU a process consumes can vary widely over its > lifetime. A task that first loads a bunch of data over the network and then > performs complex computations on it will suffer if additional CPU-heavy tasks > are scheduled on the same node because its initial CPU-utilization was low. > To guard against this, we will need to be conservative with how we > dynamically oversubscribe. If a user wants to explicitly hint to the > scheduler that their task will not use much CPU, the scheduler should be able > to take this into account. > On YARN-2, there are concerns that including floating point arithmetic in the > scheduler will slow it down. I question this assumption, and it is perhaps > worth debating, but I think we can sidestep the issue by multiplying > CPU-quantities inside the scheduler by a decently sized number like 1000 and > keep doing the computations on integers. > The relevant APIs are marked as evolving, so there's no need for the change > to delay 2.1.0-beta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1004) yarn.scheduler.minimum|maximum|increment-allocation-mb should have scheduler
[ https://issues.apache.org/jira/browse/YARN-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725395#comment-13725395 ] Bikas Saha commented on YARN-1004: -- It would be sad to have deprecated configs when YARN is still trying to go beta. If this is really important, lets fix it and include it in the beta RC. > yarn.scheduler.minimum|maximum|increment-allocation-mb should have scheduler > > > Key: YARN-1004 > URL: https://issues.apache.org/jira/browse/YARN-1004 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza > > As yarn.scheduler.minimum-allocation-mb is now a scheduler-specific > configuration, and functions differently for the Fair and Capacity > schedulers, it would be less confusing for the config names to include the > scheduler names, i.e. yarn.scheduler.fair.minimum-allocation-mb, > yarn.scheduler.capacity.minimum-allocation-mb, and > yarn.scheduler.fifo.minimum-allocation-mb. > The same goes for yarn.scheduler.increment-allocation-mb, which only exists > for the Fair Scheduler, and yarn.scheduler.maximum-allocation-mb, for > consistency. > If we wish to preserve backwards compatibility, we can deprecate the old > configs to the new ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-107) ClientRMService.forceKillApplication() should handle the non-RUNNING applications properly
[ https://issues.apache.org/jira/browse/YARN-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-107: --- Attachment: YARN-107.3.patch Fix -1 release audit. > ClientRMService.forceKillApplication() should handle the non-RUNNING > applications properly > -- > > Key: YARN-107 > URL: https://issues.apache.org/jira/browse/YARN-107 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.0-alpha >Reporter: Devaraj K >Assignee: Xuan Gong > Attachments: YARN-107.1.patch, YARN-107.2.patch, YARN-107.3.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-643) WHY appToken is removed both in BaseFinalTransition and AMUnregisteredTransition AND clientToken is removed in FinalTransition and not BaseFinalTransition
[ https://issues.apache.org/jira/browse/YARN-643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-643: --- Attachment: YARN-643.2.patch > WHY appToken is removed both in BaseFinalTransition and > AMUnregisteredTransition AND clientToken is removed in FinalTransition and > not BaseFinalTransition > -- > > Key: YARN-643 > URL: https://issues.apache.org/jira/browse/YARN-643 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Xuan Gong > Attachments: YARN-643.1.patch, YARN-643.2.patch > > > The jira is tracking why appToken and clientToAMToken is removed separately, > and why they are distributed in different transitions, ideally there may be a > common place where these two tokens can be removed at the same time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-972) Allow requests and scheduling for fractional virtual cores
[ https://issues.apache.org/jira/browse/YARN-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725372#comment-13725372 ] Steve Loughran commented on YARN-972: - I'd argue against fractional core assignment not on CPU or FPU grounds but in $L1, $L2 and $L3 hit. Those caches are the key to performance, and if you are trying to overburden the cores with work then its the cache miss penalty that kills the jobs. MR dodges this by having most tasks regularly blocking for IO operations, but other workloads have different characteristics. Also, as Timothy Points out, noone is going to be releasing CPUs with less cores on them: the #will only increase. Optimising for todays 4-8 cores is a premature optimisation. > Allow requests and scheduling for fractional virtual cores > -- > > Key: YARN-972 > URL: https://issues.apache.org/jira/browse/YARN-972 > Project: Hadoop YARN > Issue Type: Improvement > Components: api, scheduler >Affects Versions: 2.0.5-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > > As this idea sparked a fair amount of discussion on YARN-2, I'd like to go > deeper into the reasoning. > Currently the virtual core abstraction hides two orthogonal goals. The first > is that a cluster might have heterogeneous hardware and that the processing > power of different makes of cores can vary wildly. The second is that a > different (combinations of) workloads can require different levels of > granularity. E.g. one admin might want every task on their cluster to use at > least a core, while another might want applications to be able to request > quarters of cores. The former would configure a single vcore per core. The > latter would configure four vcores per core. > I don't think that the abstraction is a good way of handling the second goal. > Having a virtual cores refer to different magnitudes of processing power on > different clusters will make the difficult problem of deciding how many cores > to request for a job even more confusing. > Can we not handle this with dynamic oversubscription? > Dynamic oversubscription, i.e. adjusting the number of cores offered by a > machine based on measured CPU-consumption, should work as a complement to > fine-granularity scheduling. Dynamic oversubscription is never going to be > perfect, as the amount of CPU a process consumes can vary widely over its > lifetime. A task that first loads a bunch of data over the network and then > performs complex computations on it will suffer if additional CPU-heavy tasks > are scheduled on the same node because its initial CPU-utilization was low. > To guard against this, we will need to be conservative with how we > dynamically oversubscribe. If a user wants to explicitly hint to the > scheduler that their task will not use much CPU, the scheduler should be able > to take this into account. > On YARN-2, there are concerns that including floating point arithmetic in the > scheduler will slow it down. I question this assumption, and it is perhaps > worth debating, but I think we can sidestep the issue by multiplying > CPU-quantities inside the scheduler by a decently sized number like 1000 and > keep doing the computations on integers. > The relevant APIs are marked as evolving, so there's no need for the change > to delay 2.1.0-beta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-997) JMX support for node resource configuration
[ https://issues.apache.org/jira/browse/YARN-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725310#comment-13725310 ] Junping Du commented on YARN-997: - I think Luke already address you previous question on HADOOP-9160. So I put HADOOP-9160 as blocker for this jira, so we can discuss any concern on JMX there before we move forward on this jira. > JMX support for node resource configuration > --- > > Key: YARN-997 > URL: https://issues.apache.org/jira/browse/YARN-997 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, scheduler >Reporter: Junping Du > > Beside YARN CLI and REST API, we can enable JMX interface to change node's > resource. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1004) yarn.scheduler.minimum|maximum|increment-allocation-mb should have scheduler
[ https://issues.apache.org/jira/browse/YARN-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725299#comment-13725299 ] Hitesh Shah commented on YARN-1004: --- Is there a reason why maximum-allocation is also scheduler specific? > yarn.scheduler.minimum|maximum|increment-allocation-mb should have scheduler > > > Key: YARN-1004 > URL: https://issues.apache.org/jira/browse/YARN-1004 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza > > As yarn.scheduler.minimum-allocation-mb is now a scheduler-specific > configuration, and functions differently for the Fair and Capacity > schedulers, it would be less confusing for the config names to include the > scheduler names, i.e. yarn.scheduler.fair.minimum-allocation-mb, > yarn.scheduler.capacity.minimum-allocation-mb, and > yarn.scheduler.fifo.minimum-allocation-mb. > The same goes for yarn.scheduler.increment-allocation-mb, which only exists > for the Fair Scheduler, and yarn.scheduler.maximum-allocation-mb, for > consistency. > If we wish to preserve backwards compatibility, we can deprecate the old > configs to the new ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-602) NodeManager should mandatorily set some Environment variables into every containers that it launches
[ https://issues.apache.org/jira/browse/YARN-602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725296#comment-13725296 ] Kenji Kikushima commented on YARN-602: -- Sorry, late for update and windows case error. I have only Linux and OS X environment. How can I fix putting env in windows case? Is it elevant to Environment.USER(USERNAME in Windows)? > NodeManager should mandatorily set some Environment variables into every > containers that it launches > > > Key: YARN-602 > URL: https://issues.apache.org/jira/browse/YARN-602 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Xuan Gong >Assignee: Kenji Kikushima > Attachments: YARN-602.patch > > > NodeManager should mandatorily set some Environment variables into every > containers that it launches, such as Environment.user, Environment.pwd. If > both users and NodeManager set those variables, the value set by NM should be > used -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-948) RM should validate the release container list before actually releasing them
[ https://issues.apache.org/jira/browse/YARN-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725278#comment-13725278 ] Hudson commented on YARN-948: - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1504 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1504/]) YARN-948. Changed ResourceManager to validate the release container list before actually releasing them. Contributed by Omkar Vinit Joshi. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1508609) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/InvalidContainerReleaseException.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationmasterservice/TestApplicationMasterService.java > RM should validate the release container list before actually releasing them > > > Key: YARN-948 > URL: https://issues.apache.org/jira/browse/YARN-948 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Fix For: 2.1.1-beta > > Attachments: YARN-948-20130724.patch, YARN-948-20130726.1.patch, > YARN-948-20130729.1.patch > > > At present we are blinding passing the allocate request containing containers > to be released to the scheduler. This may result into one application > releasing another application's container. > {code} > @Override > @Lock(Lock.NoLock.class) > public Allocation allocate(ApplicationAttemptId applicationAttemptId, > List ask, List release, > List blacklistAdditions, List blacklistRemovals) { > FiCaSchedulerApp application = getApplication(applicationAttemptId); > > > // Release containers > for (ContainerId releasedContainerId : release) { > RMContainer rmContainer = getRMContainer(releasedContainerId); > if (rmContainer == null) { > RMAuditLogger.logFailure(application.getUser(), > AuditConstants.RELEASE_CONTAINER, > "Unauthorized access or invalid container", "CapacityScheduler", > "Trying to release container not owned by app or with invalid > id", > application.getApplicationId(), releasedContainerId); > } > completedContainer(rmContainer, > SchedulerUtils.createAbnormalContainerStatus( > releasedContainerId, > SchedulerUtils.RELEASED_CONTAINER), > RMContainerEventType.RELEASED); > } > {code} > Current checks are not sufficient and we should prevent this. thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-966) The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED
[ https://issues.apache.org/jira/browse/YARN-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725275#comment-13725275 ] Hudson commented on YARN-966: - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1504 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1504/]) YARN-966. Fixed ContainerLaunch to not fail quietly when there are no localized resources due to some other failure. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1508688) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/TestContainer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java > The thread of ContainerLaunch#call will fail without any signal if > getLocalizedResources() is called when the container is not at LOCALIZED > --- > > Key: YARN-966 > URL: https://issues.apache.org/jira/browse/YARN-966 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Fix For: 2.1.1-beta > > Attachments: YARN-966.1.patch > > > In ContainerImpl.getLocalizedResources(), there's: > {code} > assert ContainerState.LOCALIZED == getContainerState(); // TODO: FIXME!! > {code} > ContainerImpl.getLocalizedResources() is called in ContainerLaunch.call(), > which is scheduled on a separate thread. If the container is not at LOCALIZED > (e.g. it is at KILLING, see YARN-906), an AssertError will be thrown and > fails the thread without notifying NM. Therefore, the container cannot > receive more events, which are supposed to be sent from > ContainerLaunch.call(), and move towards completion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-573) Shared data structures in Public Localizer and Private Localizer are not Thread safe.
[ https://issues.apache.org/jira/browse/YARN-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725272#comment-13725272 ] Jason Lowe commented on YARN-573: - Thanks for picking this up, Omkar. A couple of comments: * LocalizerRunner.pending is accessed without synchronization in the update() method. Maybe it would be simpler to just use a SynchronizedList wrapper? That would make it a bit more robust in light of maintenance changes in the future as well. * Nit: The PublicLocalizer constructor that takes a Map isn't really used, and as we know {{pending}} can't be just any Map for it to work properly. I'd be tempted to remove that constructor, but it's not a necessary change. > Shared data structures in Public Localizer and Private Localizer are not > Thread safe. > - > > Key: YARN-573 > URL: https://issues.apache.org/jira/browse/YARN-573 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi >Priority: Critical > Attachments: YARN-573-20130730.1.patch > > > PublicLocalizer > 1) pending accessed by addResource (part of event handling) and run method > (as a part of PublicLocalizer.run() ). > PrivateLocalizer > 1) pending accessed by addResource (part of event handling) and > findNextResource (i.remove()). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-107) ClientRMService.forceKillApplication() should handle the non-RUNNING applications properly
[ https://issues.apache.org/jira/browse/YARN-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725237#comment-13725237 ] Jason Lowe commented on YARN-107: - I still think throwing an exception for this is a mistake and makes the API harder to wield. What is the use-case where throwing the exception is necessary? > ClientRMService.forceKillApplication() should handle the non-RUNNING > applications properly > -- > > Key: YARN-107 > URL: https://issues.apache.org/jira/browse/YARN-107 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.0-alpha >Reporter: Devaraj K >Assignee: Xuan Gong > Attachments: YARN-107.1.patch, YARN-107.2.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-966) The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED
[ https://issues.apache.org/jira/browse/YARN-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725211#comment-13725211 ] Hudson commented on YARN-966: - FAILURE: Integrated in Hadoop-Hdfs-trunk #1477 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1477/]) YARN-966. Fixed ContainerLaunch to not fail quietly when there are no localized resources due to some other failure. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1508688) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/TestContainer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java > The thread of ContainerLaunch#call will fail without any signal if > getLocalizedResources() is called when the container is not at LOCALIZED > --- > > Key: YARN-966 > URL: https://issues.apache.org/jira/browse/YARN-966 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Fix For: 2.1.1-beta > > Attachments: YARN-966.1.patch > > > In ContainerImpl.getLocalizedResources(), there's: > {code} > assert ContainerState.LOCALIZED == getContainerState(); // TODO: FIXME!! > {code} > ContainerImpl.getLocalizedResources() is called in ContainerLaunch.call(), > which is scheduled on a separate thread. If the container is not at LOCALIZED > (e.g. it is at KILLING, see YARN-906), an AssertError will be thrown and > fails the thread without notifying NM. Therefore, the container cannot > receive more events, which are supposed to be sent from > ContainerLaunch.call(), and move towards completion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-948) RM should validate the release container list before actually releasing them
[ https://issues.apache.org/jira/browse/YARN-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725215#comment-13725215 ] Hudson commented on YARN-948: - FAILURE: Integrated in Hadoop-Hdfs-trunk #1477 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1477/]) YARN-948. Changed ResourceManager to validate the release container list before actually releasing them. Contributed by Omkar Vinit Joshi. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1508609) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/InvalidContainerReleaseException.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationmasterservice/TestApplicationMasterService.java > RM should validate the release container list before actually releasing them > > > Key: YARN-948 > URL: https://issues.apache.org/jira/browse/YARN-948 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Fix For: 2.1.1-beta > > Attachments: YARN-948-20130724.patch, YARN-948-20130726.1.patch, > YARN-948-20130729.1.patch > > > At present we are blinding passing the allocate request containing containers > to be released to the scheduler. This may result into one application > releasing another application's container. > {code} > @Override > @Lock(Lock.NoLock.class) > public Allocation allocate(ApplicationAttemptId applicationAttemptId, > List ask, List release, > List blacklistAdditions, List blacklistRemovals) { > FiCaSchedulerApp application = getApplication(applicationAttemptId); > > > // Release containers > for (ContainerId releasedContainerId : release) { > RMContainer rmContainer = getRMContainer(releasedContainerId); > if (rmContainer == null) { > RMAuditLogger.logFailure(application.getUser(), > AuditConstants.RELEASE_CONTAINER, > "Unauthorized access or invalid container", "CapacityScheduler", > "Trying to release container not owned by app or with invalid > id", > application.getApplicationId(), releasedContainerId); > } > completedContainer(rmContainer, > SchedulerUtils.createAbnormalContainerStatus( > releasedContainerId, > SchedulerUtils.RELEASED_CONTAINER), > RMContainerEventType.RELEASED); > } > {code} > Current checks are not sufficient and we should prevent this. thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-1005) Log aggregators should check for FSDataOutputStream close before renaming to aggregated file.
Rohith Sharma K S created YARN-1005: --- Summary: Log aggregators should check for FSDataOutputStream close before renaming to aggregated file. Key: YARN-1005 URL: https://issues.apache.org/jira/browse/YARN-1005 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.0-beta, 2.0.5-alpha Reporter: Rohith Sharma K S If AggregatedLogFormat.LogWriter.closeWriter() is interuppted, then "remoteNodeTmpLogFileForApp" is renamed to "remoteNodeLogFileForApp" file. This renamed file does not contain valid aggregated logs. There can be situation renamed file can be not in BCFile format. This cause issue while viewing from JobHistoryServer web page. {noformat} 2013-07-27 18:51:14,787 ERROR org.apache.hadoop.yarn.webapp.View: Error getting logs for job_1374918614757_0002 java.io.IOException: Not a valid BCFile. at org.apache.hadoop.io.file.tfile.BCFile$Magic.readAndVerify(BCFile.java:927) at org.apache.hadoop.io.file.tfile.BCFile$Reader.(BCFile.java:628) at org.apache.hadoop.io.file.tfile.TFile$Reader.(TFile.java:804) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.(AggregatedLogFormat.java:337) at org.apache.hadoop.yarn.webapp.log.AggregatedLogsBlock.render(AggregatedLogsBlock.java:89) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:64) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:74) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725111#comment-13725111 ] Steve Loughran commented on YARN-445: - I like Chris's #3 option, as it allows you to add things like a graceful shutdown to a piece of code that you don't want to/can't change. the command would have to run with the same path & other env params as the original source if you want to do things like exec an HBase decommission command > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.1.0-beta >Reporter: Jason Lowe > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-770) NPE NodeStatusUpdaterImpl
[ https://issues.apache.org/jira/browse/YARN-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725109#comment-13725109 ] Steve Loughran commented on YARN-770: - I was just running a test which used a MiniYARN cluster; it showed this NPE in the stack when running a test case while the network was playing up. I haven't seen it since that single incident Is there a codepath that could lead to this NPE-ing condition if some previous operation failed, with timeout or IOException? > NPE NodeStatusUpdaterImpl > - > > Key: YARN-770 > URL: https://issues.apache.org/jira/browse/YARN-770 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Priority: Minor > > A mini yarn cluster based test just failed -NPE in the logs in > {{NodeStatusUpdaterImpl}}, which is probably a symptom of the problem, not > the cause -network trouble more likely there- but it shows there's some extra > checking for null responses. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-966) The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED
[ https://issues.apache.org/jira/browse/YARN-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725099#comment-13725099 ] Hudson commented on YARN-966: - SUCCESS: Integrated in Hadoop-Yarn-trunk #287 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/287/]) YARN-966. Fixed ContainerLaunch to not fail quietly when there are no localized resources due to some other failure. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1508688) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/TestContainer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java > The thread of ContainerLaunch#call will fail without any signal if > getLocalizedResources() is called when the container is not at LOCALIZED > --- > > Key: YARN-966 > URL: https://issues.apache.org/jira/browse/YARN-966 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Fix For: 2.1.1-beta > > Attachments: YARN-966.1.patch > > > In ContainerImpl.getLocalizedResources(), there's: > {code} > assert ContainerState.LOCALIZED == getContainerState(); // TODO: FIXME!! > {code} > ContainerImpl.getLocalizedResources() is called in ContainerLaunch.call(), > which is scheduled on a separate thread. If the container is not at LOCALIZED > (e.g. it is at KILLING, see YARN-906), an AssertError will be thrown and > fails the thread without notifying NM. Therefore, the container cannot > receive more events, which are supposed to be sent from > ContainerLaunch.call(), and move towards completion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-948) RM should validate the release container list before actually releasing them
[ https://issues.apache.org/jira/browse/YARN-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725103#comment-13725103 ] Hudson commented on YARN-948: - SUCCESS: Integrated in Hadoop-Yarn-trunk #287 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/287/]) YARN-948. Changed ResourceManager to validate the release container list before actually releasing them. Contributed by Omkar Vinit Joshi. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1508609) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/InvalidContainerReleaseException.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationmasterservice/TestApplicationMasterService.java > RM should validate the release container list before actually releasing them > > > Key: YARN-948 > URL: https://issues.apache.org/jira/browse/YARN-948 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Fix For: 2.1.1-beta > > Attachments: YARN-948-20130724.patch, YARN-948-20130726.1.patch, > YARN-948-20130729.1.patch > > > At present we are blinding passing the allocate request containing containers > to be released to the scheduler. This may result into one application > releasing another application's container. > {code} > @Override > @Lock(Lock.NoLock.class) > public Allocation allocate(ApplicationAttemptId applicationAttemptId, > List ask, List release, > List blacklistAdditions, List blacklistRemovals) { > FiCaSchedulerApp application = getApplication(applicationAttemptId); > > > // Release containers > for (ContainerId releasedContainerId : release) { > RMContainer rmContainer = getRMContainer(releasedContainerId); > if (rmContainer == null) { > RMAuditLogger.logFailure(application.getUser(), > AuditConstants.RELEASE_CONTAINER, > "Unauthorized access or invalid container", "CapacityScheduler", > "Trying to release container not owned by app or with invalid > id", > application.getApplicationId(), releasedContainerId); > } > completedContainer(rmContainer, > SchedulerUtils.createAbnormalContainerStatus( > releasedContainerId, > SchedulerUtils.RELEASED_CONTAINER), > RMContainerEventType.RELEASED); > } > {code} > Current checks are not sufficient and we should prevent this. thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-1004) yarn.scheduler.minimum|maximum|increment-allocation-mb should have scheduler
Sandy Ryza created YARN-1004: Summary: yarn.scheduler.minimum|maximum|increment-allocation-mb should have scheduler Key: YARN-1004 URL: https://issues.apache.org/jira/browse/YARN-1004 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.1.0-beta Reporter: Sandy Ryza As yarn.scheduler.minimum-allocation-mb is now a scheduler-specific configuration, and functions differently for the Fair and Capacity schedulers, it would be less confusing for the config names to include the scheduler names, i.e. yarn.scheduler.fair.minimum-allocation-mb, yarn.scheduler.capacity.minimum-allocation-mb, and yarn.scheduler.fifo.minimum-allocation-mb. The same goes for yarn.scheduler.increment-allocation-mb, which only exists for the Fair Scheduler, and yarn.scheduler.maximum-allocation-mb, for consistency. If we wish to preserve backwards compatibility, we can deprecate the old configs to the new ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira