[jira] [Updated] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3463: -- Attachment: (was: YARN-3643.61.patch) Integrate OrderingPolicy Framework with CapacityScheduler - Key: YARN-3463 URL: https://issues.apache.org/jira/browse/YARN-3463 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3463.50.patch, YARN-3463.61.patch Integrate the OrderingPolicy Framework with the CapacityScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3318) Create Initial OrderingPolicy Framework and FifoOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3318: -- Attachment: YARN-3318.60.patch Create Initial OrderingPolicy Framework and FifoOrderingPolicy -- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch, YARN-3318.47.patch, YARN-3318.48.patch, YARN-3318.52.patch, YARN-3318.53.patch, YARN-3318.56.patch, YARN-3318.57.patch, YARN-3318.58.patch, YARN-3318.59.patch, YARN-3318.60.patch Create the initial framework required for using OrderingPolicies and an initial FifoOrderingPolicy -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework and FifoOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14494606#comment-14494606 ] Craig Welch commented on YARN-3318: --- bq. Beyond SchedulerApplicationAttempt which is pending YARN-3361, Few comments on latest patch: I think you misunderstood, the patch doesn't depend on 3361, but after 3361 is in some things should be removed from this patch. In any case, I decided that it really belonged in the integration patch, [YARN-3463], so I've dropped it from here and it will be committed there bq. 1) CACHED_USED/CACHED_PENDING don't used by anybody, are they pending YARN-3361 as well? No, that was a miss during the ResourceUsage usage changes! Something which could affect functionality! Amazing, fixed. bq. 2) AbstractComparatorOrderingPolicy doesn't handle locks, I suggest to add synchronized lock to all methods if you think it will only be used in single-thread scenario Since the api returns iterators which must be externally synchronized, OrderingPolicy makes it clear in documentation that the burden for synchronization rests with the user (the schedulers). That's the threading model, so synchronizing here would be pointless bq. 3) FifoComparator, it will be used by FairOrderingPolicy as well? If so, better to make it to a separated class sure, done bq. 4) How about call getInfo to getStatusMessage, since the info is too generic. And add a comment to indicate it will be used for logger printing. sure, done bq. 5) getComparator of AbstractComparatorOrderingPolicy is @VisibleForTest? sure, done Create Initial OrderingPolicy Framework and FifoOrderingPolicy -- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch, YARN-3318.47.patch, YARN-3318.48.patch, YARN-3318.52.patch, YARN-3318.53.patch, YARN-3318.56.patch, YARN-3318.57.patch, YARN-3318.58.patch, YARN-3318.59.patch, YARN-3318.60.patch Create the initial framework required for using OrderingPolicies and an initial FifoOrderingPolicy -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3318) Create Initial OrderingPolicy Framework and FifoOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3318: -- Attachment: YARN-3318.61.patch fix findbugs recurrance due to class name change Create Initial OrderingPolicy Framework and FifoOrderingPolicy -- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch, YARN-3318.47.patch, YARN-3318.48.patch, YARN-3318.52.patch, YARN-3318.53.patch, YARN-3318.56.patch, YARN-3318.57.patch, YARN-3318.58.patch, YARN-3318.59.patch, YARN-3318.60.patch, YARN-3318.61.patch Create the initial framework required for using OrderingPolicies and an initial FifoOrderingPolicy -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14494892#comment-14494892 ] Craig Welch commented on YARN-3463: --- bq. 1) Preemption policy changes seems not correct to me... So, I believe the behavior for FIFO should be exactly as it was before - and all of the preemption tests were passing with the combined patch, so I think this is the case. Fairness preemption would be handled on [YARN-3319]. I don't mind moving the final integration for preemption into another jira, but I don't believe the concern is correct / there is any behavioral change for FIFO. bq. 2) WebUI, REST API and CLI changes are public APIs and related to core changes in CS... There are no REST API or CLI changes in the patch anymore, we agreed on [YARN-3318] https://issues.apache.org/jira/browse/YARN-3318?focusedCommentId=14393347page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14393347 that the WebUI changes should stay with the initial integration - it's very important/needed to be able to confirm that configuration was accomplished properly, without it there is no way to tell what policy is active. bq. So I suggest only leave core changes for CS including configuration So, I think given the WebUI bit above, this is already the case, with the possible exception of preemption which, again, I think has not seen any behavior change for FIFO, which is all we have at this time. Integrate OrderingPolicy Framework with CapacityScheduler - Key: YARN-3463 URL: https://issues.apache.org/jira/browse/YARN-3463 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3463.50.patch, YARN-3643.58.patch Integrate the OrderingPolicy Framework with the CapacityScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3463: -- Attachment: YARN-3319.61.patch Updated, matches/should apply and work with [YARN-3318] .61.patch Integrate OrderingPolicy Framework with CapacityScheduler - Key: YARN-3463 URL: https://issues.apache.org/jira/browse/YARN-3463 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3319.61.patch, YARN-3463.50.patch, YARN-3643.58.patch Integrate the OrderingPolicy Framework with the CapacityScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3319) Implement a FairOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3319: -- Attachment: YARN-3319.58.patch Implement a FairOrderingPolicy -- Key: YARN-3319 URL: https://issues.apache.org/jira/browse/YARN-3319 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3319.13.patch, YARN-3319.14.patch, YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch, YARN-3319.45.patch, YARN-3319.47.patch, YARN-3319.53.patch, YARN-3319.58.patch Implement a FairOrderingPolicy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. The Policy will offer allocations to applications in a queue in order of least resources used, and preempt applications in reverse order (from most resources used). This will include conditional support for sizeBasedWeight style adjustment Optionally, based on a conditional configuration to enable sizeBasedWeight (default false), an adjustment to boost larger applications (to offset the natural preference for smaller applications) will adjust the resource usage value based on demand, dividing it by the below value: Math.log1p(app memory demand) / Math.log(2); In cases where the above is indeterminate (two applications are equal after this comparison), behavior falls back to comparison based on the application id, which is generally lexically FIFO for that comparison -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3318) Create Initial OrderingPolicy Framework and FifoOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3318: -- Attachment: YARN-3318.58.patch Missed attaching ResourceUsage Create Initial OrderingPolicy Framework and FifoOrderingPolicy -- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch, YARN-3318.47.patch, YARN-3318.48.patch, YARN-3318.52.patch, YARN-3318.53.patch, YARN-3318.56.patch, YARN-3318.57.patch, YARN-3318.58.patch Create the initial framework required for using OrderingPolicies and an initial FifoOrderingPolicy -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3318) Create Initial OrderingPolicy Framework and FifoOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3318: -- Attachment: YARN-3318.57.patch Better ResourceUsage usage Create Initial OrderingPolicy Framework and FifoOrderingPolicy -- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch, YARN-3318.47.patch, YARN-3318.48.patch, YARN-3318.52.patch, YARN-3318.53.patch, YARN-3318.56.patch, YARN-3318.57.patch Create the initial framework required for using OrderingPolicies and an initial FifoOrderingPolicy -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3463: -- Attachment: YARN-3643.58.patch Integrate OrderingPolicy Framework with CapacityScheduler - Key: YARN-3463 URL: https://issues.apache.org/jira/browse/YARN-3463 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3463.50.patch, YARN-3643.58.patch Integrate the OrderingPolicy Framework with the CapacityScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3318) Create Initial OrderingPolicy Framework and FifoOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3318: -- Attachment: YARN-3318.59.patch checkpatch fixes Create Initial OrderingPolicy Framework and FifoOrderingPolicy -- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch, YARN-3318.47.patch, YARN-3318.48.patch, YARN-3318.52.patch, YARN-3318.53.patch, YARN-3318.56.patch, YARN-3318.57.patch, YARN-3318.58.patch, YARN-3318.59.patch Create the initial framework required for using OrderingPolicies and an initial FifoOrderingPolicy -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework and FifoOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14491959#comment-14491959 ] Craig Welch commented on YARN-3318: --- added comments to compoundcomparator, re-introduced getId (in lieu of getName), switched to ResourceUsage - to avoid unnecessary dependency on [YARN-3361] SchedularApplicationAttempt manages pending in a way it won't have to long term, this doesn't effect the api and allows these to be committed in any order. Sticking with Scheduling instead of Cached as suggested earlier by [~vinodkv] to keep it's purpose clear (Cached is too general) and because it can't be used as a generalized cache of the values, the lifecycle is tied to use by OrderingPolicies. Create Initial OrderingPolicy Framework and FifoOrderingPolicy -- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch, YARN-3318.47.patch, YARN-3318.48.patch, YARN-3318.52.patch, YARN-3318.53.patch, YARN-3318.56.patch Create the initial framework required for using OrderingPolicies and an initial FifoOrderingPolicy -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework and FifoOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492615#comment-14492615 ] Craig Welch commented on YARN-3318: --- Looking again at using ResourceUsage instead of the initial use of application demand and consumption, while it may be preferable for future cases like queues with node label aware policies, there are deficiencies which need to be addressed to use it for the initial case, and it makes it more complex to do so. In fact, for the initial case, this approach is inferior. ResourceUsage is still a bit rough and incomplete, get does not properly handle the ANY/ALL case, which is what we need for application fairness - otherwise, applications whose resource requests are labeled something other than NO_LABEL will be erroneously preferred for scheduling in the fair case. The prior approach was working with full consumption and demand and did not have this issue and did not require additional change to support fairness properly. Even supporting ANY/ALL in ResourceUsage is a little tricky, as I see no reason why someone could not set values on ResourceUsage using the ANY label definition, and then there is a question as to what is the proper behavior for an ANY get request - should it sum all the values for all labels (which is, in some sense, correct), or just return the previously set ANY value? Should we disallow setting ANY? (that seems a bit arbitrary...) My suggestion is that we introduce explicit getAll(Used, Pending, etc), (not an ALL CommonNodeLabelsManager constant, I think this just moves/replicates the existing problem). There would be no corresponding setAll. getAll(XYZ) would iterate all labels in ResourceUsage for the passed ResourceType and return a total. For OrderingPolicy, the values should be cached on ResourceUsage instead of in SchedulableEntity for cases where that is needed - cloning an entire ResourceUsage will be expensive, inefficient, and unnecessary. We could add a separate cache collection in ResourceUsage, but I think it would actually be better to add values to the ResourceType enum, SCHEDULING_USED, SCHEDULING_PENDING When updating the cached value for Used, OrderingPolicy would then call getAllUsed() on ResourceUsage and set the resulting value with set (ANY node label expression, SCHEDULING_USED ResourceType), and for demand, getAllPending() and then set ANY node label expression, SCHEDULING_PENDING When getting the cached value, OrderingPolicy would call getUsed(ANY nlexpression, SCHEDULING_USED ResourceType) and for pending, getPending(ANY, SCHEDULING_PENDING) I'm inclined to roll forward with using ResourceUsage despite this additional scope to ease future usecases, but we need to be very careful about continuing to pull in additional change and complexity which is not required right now, and should avoid doing so again this iteration. It's good to aim for a stable api, but it's also good to complete the initial functionality, and to realize it's not possible to anticipate all future needs / highly likely there will be some change to api's like this as the system evolves. Create Initial OrderingPolicy Framework and FifoOrderingPolicy -- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch, YARN-3318.47.patch, YARN-3318.48.patch, YARN-3318.52.patch, YARN-3318.53.patch, YARN-3318.56.patch Create the initial framework required for using OrderingPolicies and an initial FifoOrderingPolicy -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3318) Create Initial OrderingPolicy Framework and FifoOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3318: -- Attachment: YARN-3318.56.patch Create Initial OrderingPolicy Framework and FifoOrderingPolicy -- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch, YARN-3318.47.patch, YARN-3318.48.patch, YARN-3318.52.patch, YARN-3318.53.patch, YARN-3318.56.patch Create the initial framework required for using OrderingPolicies and an initial FifoOrderingPolicy -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework and FifoOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487631#comment-14487631 ] Craig Welch commented on YARN-3318: --- bq. ...Do we really see non-comparator based ordering-policy. We are unnecessarily adding two abstractions - adding policies and comparators... In the context of the code so far, the comparator based approach is specific to compounding comparators to achieve functionality (priority + fifo, fair + fifo, etc). This was the initial motivation for the two level api configuration, the broader surface of the policy which would allow for different collection types, sorting on demand, etc, (the original policy) and the narrower one within that (comparator) for the cases where comparator logic was sufficient, e.g. sharing a collection (for composition) and a collection type (a tree, for efficient resorting of individual elements when required) was possible. The two level api configuration was not well received. Offline Wangda has indicated that he thinks there are policies coming up which will need the wider, initial api, with control over the collection, sorting, etc. Supporting policy composition for those cases would be very awkward is not really worth pursuing. The various competing tradeoffs, the aversion to a multilevel api, the need for the higher level api, and the ability to compose policies creates something of a tension, I don't think it's realistic to try and accomplish it all together, the result will be Frankensteinian at best. Something has to go. Originally, I chose the multilevel api to resolve the dilemma, I like that choice, it seems unpopular with the crowd. Given that, the other optional dynamic is the ability to compose policies (there's no requirement for either of these as far as I can tell, it is a bonus feature). While I like the composition approach, it can't be maintained as such with the broader api and without the multilevel config/api. As one of these has to go and it appears it can't be the broader api or the multilevel api I suppose it will have to be composition. Internally there can be some composition of course, but it won't be transparent/exposed/configurable as it was initially. I'll put out a patch with that in a bit. Create Initial OrderingPolicy Framework and FifoOrderingPolicy -- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch, YARN-3318.47.patch, YARN-3318.48.patch Create the initial framework required for using OrderingPolicies and an initial FifoOrderingPolicy -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3318) Create Initial OrderingPolicy Framework and FifoOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3318: -- Attachment: YARN-3318.52.patch Update, removing composition in favor of broader interface Create Initial OrderingPolicy Framework and FifoOrderingPolicy -- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch, YARN-3318.47.patch, YARN-3318.48.patch, YARN-3318.52.patch Create the initial framework required for using OrderingPolicies and an initial FifoOrderingPolicy -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3293) Track and display capacity scheduler health metrics in web UI
[ https://issues.apache.org/jira/browse/YARN-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485466#comment-14485466 ] Craig Welch commented on YARN-3293: --- Overall +1 looks good to me. One additional thing occurred to me when looking it over again - I think that CapacitySchedulerHealthInfo in the web dao is, for the most part, cross-scheduler. Does it make sense to factor most of it up into a generalized SchedulerHealthInfo with all the common pieces and extend it (to CapacitySchedulerHealthInfo) just for the CS specific constructor? Track and display capacity scheduler health metrics in web UI - Key: YARN-3293 URL: https://issues.apache.org/jira/browse/YARN-3293 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: Screen Shot 2015-03-30 at 4.30.14 PM.png, apache-yarn-3293.0.patch, apache-yarn-3293.1.patch, apache-yarn-3293.2.patch, apache-yarn-3293.4.patch, apache-yarn-3293.5.patch, apache-yarn-3293.6.patch It would be good to display metrics that let users know about the health of the capacity scheduler in the web UI. Today it is hard to get an idea if the capacity scheduler is functioning correctly. Metrics such as the time for the last allocation, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3293) Track and display capacity scheduler health metrics in web UI
[ https://issues.apache.org/jira/browse/YARN-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485557#comment-14485557 ] Craig Welch commented on YARN-3293: --- Your call, I think it's also fine to wait to do this until we do FairScheduler integration when we are clear on exactly what needs to happen (it may be premature to do it now, not entirely sure), but ultimately I think as much as can be shared should be. Track and display capacity scheduler health metrics in web UI - Key: YARN-3293 URL: https://issues.apache.org/jira/browse/YARN-3293 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: Screen Shot 2015-03-30 at 4.30.14 PM.png, apache-yarn-3293.0.patch, apache-yarn-3293.1.patch, apache-yarn-3293.2.patch, apache-yarn-3293.4.patch, apache-yarn-3293.5.patch, apache-yarn-3293.6.patch It would be good to display metrics that let users know about the health of the capacity scheduler in the web UI. Today it is hard to get an idea if the capacity scheduler is functioning correctly. Metrics such as the time for the last allocation, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2696) Queue sorting in CapacityScheduler should consider node label
[ https://issues.apache.org/jira/browse/YARN-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485954#comment-14485954 ] Craig Welch commented on YARN-2696: --- Why? And what kind of consideration, exactly? Queue sorting in CapacityScheduler should consider node label - Key: YARN-2696 URL: https://issues.apache.org/jira/browse/YARN-2696 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan In the past, when trying to allocate containers under a parent queue in CapacityScheduler. The parent queue will choose child queues by the used resource from smallest to largest. Now we support node label in CapacityScheduler, we should also consider used resource in child queues by node labels when allocating resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework and FifoOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14486312#comment-14486312 ] Craig Welch commented on YARN-3318: --- bq. 1) Regarding OrderingPolicy and SchedulingOrder responsibilities: Scheduling order has multiple purposes, including: 1. Housing supporting code for using policies common across schedulers, e.g. a common implementation of behavior 2. Allowing for the composition of multiple policies together to accomplish desired queue behavior - it is awkward to factor the functionality in SchedulingOrder down into the policies, as multiple policies are in play for one instance of the logic in SchedulingOrder Although I mentioned that it could be made abstract some day if needed, that's not it's current purpose, the above are. bq. ...Looking at methods of OrderingPolicy, most of them are just pass through parameters to OrderingPolicy, and rest of them are instantiation OrderingPolicies... Well, no, it has quite a lot of implementation logic around managing the SchedulerProcess collection and the interactions between it and multiple policies, it is certainly not limited to Factory operations bq. OrderingPolicy should be a per-queue instance or global library OrderingPolicies are per-queue and stateful in terms of configuration specific to that queue configuration. For the reasons mentioned above regarding the composition of policies, they do not (and should not) maintain queue state (scheduler processes, etc). bq. Suggestion about OrderingPolicy interface design (if you agree with 1/2): I don't agree, so skipping the section. The essential thing that I think is being missed here is that there is an intentional desire to compose ordering policies for a queue to achieve behavior - so priority + fifo, or fair + fifo, etc, and for that reason it is not appropriate to place the management of the collection of processes shared amongst policies into the policy implementation - it belongs outside, as it is today, in SchedulingOrder. Mixing these together defeats composition and also mixes concerns, making the code more (not less) complex and certainly less clean in terms of separation of concern and overall design and flow. bq. ...CompoundOrderingPolicy is implemenation detail for FairOrderingPolicy, don't need put in the patch... Not is isn't, it's a feature of the generalized framework to support multiple policies being composed for a queue, it's not specific to fairness at all (fairness may be the first user, but so might priority - in any case, any set of policies may use it, it's not specific to any one of them, and therefore is framework...) bq. ...About spliting SchedulableProcess to App and Queue... I stand by my earlier explanation (and don't see anything here which alters it...), I anticipate that with the current factoring of SchedulerProcess we won't have to subtype it to support Queues. That said, the right time to do that is when we are adding such support, anticipatory complexity is the worst kind. It is factored such that adding the subtyping should be additive if it needs to happen, so there is no need to anticipate it now (the room is there to add it, which is all we need. We should wait to add it until we know we need it). bq. ...As I mentioned before, use ResourceUsage is much better... As I mentioned before, it doesn't presently supply the needed functionality, when it does we can convert to it. Create Initial OrderingPolicy Framework and FifoOrderingPolicy -- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch, YARN-3318.47.patch, YARN-3318.48.patch Create the initial framework required for using OrderingPolicies and an initial FifoOrderingPolicy -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler
Craig Welch created YARN-3463: - Summary: Integrate OrderingPolicy Framework with CapacityScheduler Key: YARN-3463 URL: https://issues.apache.org/jira/browse/YARN-3463 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Integrate the OrderingPolicy Framework with the CapacityScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3319) Implement a FairOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14484368#comment-14484368 ] Craig Welch commented on YARN-3319: --- Apply after applying YARN-3318 and YARN-3463 Implement a FairOrderingPolicy -- Key: YARN-3319 URL: https://issues.apache.org/jira/browse/YARN-3319 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3319.13.patch, YARN-3319.14.patch, YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch, YARN-3319.45.patch, YARN-3319.47.patch Implement a FairOrderingPolicy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. The Policy will offer allocations to applications in a queue in order of least resources used, and preempt applications in reverse order (from most resources used). This will include conditional support for sizeBasedWeight style adjustment Optionally, based on a conditional configuration to enable sizeBasedWeight (default false), an adjustment to boost larger applications (to offset the natural preference for smaller applications) will adjust the resource usage value based on demand, dividing it by the below value: Math.log1p(app memory demand) / Math.log(2); In cases where the above is indeterminate (two applications are equal after this comparison), behavior falls back to comparison based on the application id, which is generally lexically FIFO for that comparison -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3463: -- Attachment: YARN-3463.50.patch Must apply YARN-3318 patch first Integrate OrderingPolicy Framework with CapacityScheduler - Key: YARN-3463 URL: https://issues.apache.org/jira/browse/YARN-3463 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3463.50.patch Integrate the OrderingPolicy Framework with the CapacityScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3319) Implement a FairOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3319: -- Summary: Implement a FairOrderingPolicy (was: Implement a Fair SchedulerOrderingPolicy) Implement a FairOrderingPolicy -- Key: YARN-3319 URL: https://issues.apache.org/jira/browse/YARN-3319 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3319.13.patch, YARN-3319.14.patch, YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch Implement a Fair Comparator for the Scheduler Comparator Ordering Policy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. The Policy will offer allocations to applications in a queue in order of least resources used, and preempt applications in reverse order (from most resources used). This will include conditional support for sizeBasedWeight style adjustment An implementation of a Scheduler Comparator for use with the Scheduler Comparator Ordering Policy will be built with the below comparison for ordering applications for container assignment (ascending) and for preemption (descending) Current resource usage - less usage is lesser Submission time - earlier is lesser Optionally, based on a conditional configuration to enable sizeBasedWeight (default false), an adjustment to boost larger applications (to offset the natural preference for smaller applications) will adjust the resource usage value based on demand, dividing it by the below value: Math.log1p(app memory demand) / Math.log(2); In cases where the above is indeterminate (two applications are equal after this comparison), behavior falls back to comparison based on the application name, which is lexically FIFO for that comparison (first submitted is lesser) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3318) Create Initial OrderingPolicy Framework and FifoOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3318: -- Summary: Create Initial OrderingPolicy Framework and FifoOrderingPolicy (was: Create Initial OrderingPolicy Framework) Create Initial OrderingPolicy Framework and FifoOrderingPolicy -- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch Create the initial framework required for using OrderingPolicies -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3318) Create Initial OrderingPolicy Framework and FifoOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3318: -- Description: Create the initial framework required for using OrderingPolicies and an initial FifoOrderingPolicy (was: Create the initial framework required for using OrderingPolicies) Create Initial OrderingPolicy Framework and FifoOrderingPolicy -- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch Create the initial framework required for using OrderingPolicies and an initial FifoOrderingPolicy -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework and FifoOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395508#comment-14395508 ] Craig Welch commented on YARN-3318: --- [~vinodkv] bq. ...We can strictly focus on the policy framework here... Sure, limited patch to framework bq. ...You could also say SchedulableProcess... SchedulableProcess it is, done bq. I agree to this, but we are not in a position to support the APIs, CLI, config names in a supportable manner yet. They may or may not change depending on how parent queue policies, limit policies evolve. For that reason alone, I am saying that (1) Don't make the configurations public yet, or put a warning saying that they are unstable and (2) don't expose them in CLI , REST APIs yet. It's okay to put in the web UI, web UI scraping is not a contract. You can't see it, because it's part of Capacity Scheduler Integration, but removed CLI and proto related change. There was no rest api change, the web UI change is still present. Will warn unstable when added to config files in the scheduler integration patch bq. SchedulerApplicationAttempt.getDemand() should be private Done bq. updateCaches() - updateState() / updateSchedulingState() as that is what it is doing? getCachedConsumption() / getCachedDemand(): simply getCurrent*() ? What is the need for reorderOnContainerAllocate () / reorderOnContainerRelease()? Is now getSchedulingConsumption(); getSchedulingDemand(); updateSchedulingState(); This is needed because mutable values which are used for ordering cannot be allowed to change for an item in the tree, else it will not be found in some cases during the delete before reinsert process which occurs when a schedulable's mutable values used in comparison change (for fairness, changes to consumption and potentially demand) Not all OrderingPolicies require reordering on these events, for efficiency they get to decide if they do or not, hence the reorderOn. The reorderOn are now reorderForContainerAllocation reorderForContainerRelease bq. Move all the comparator related classed into their own package No longer needed as comparators are now just a property of policies, see below for details bq. This is really a ComparatorBasedOrderingPolicy. Do we really see non-comparator based ordering-policy. We are unnecessarily adding two abstractions - adding policies and comparators Originally, there was a perceived need to be able to support a more flexible interface than the comparator one, but also a desire to build up a simpler, composible abstraction to be used with an instance of the former which had most of the hard stuff done. Given that all of the policies we've contemplated building fit the latter abstraction and the level of flexibility does not appear to actually be that different, I think it's fair to say that we only need what was previously the SchedulerComparator abstraction as a plugin-point. Given that, a slightly refactored version of the SchedulerComparator abstraction is now the only plugin point and is now what goes by the name of OrderingPolicy. What was previously the OrderingPolicy is now a single concrete class implementing the surrounding logic, meant to be usable from any scheduler, named SchedulingOrder. So, one abstraction, a comparator-based ordering-policy. If we really do find we need a flexibility we don't have some day, the SchedulingOrder class could be abstracted to provide that higher level abstraction - but as we see no need for it now, and it appears probably never will, there's no reason to do so at present bq. ...Use className.getName()... Done [~leftnoteasy] bq. ...I prefer what Vinod suggested, split SchedulerProcess to be QueueSchedulable and AppSchedulable ... I don't see that he has suggested that. In any case, with the removal of *Serial* and the move to compareInputOrderTo() I don't at present see a need to have separate subtypes for app and queue to avoid dangling properties. And, I think if we do it right we won't end up introducing them. By splitting in the suggested way we commit ourselves to either multiple comparators (to use the differing functionality) or awkward testing of subtype/etc logic in one comparator - so it basically moves the complexity/awkwardness, it doesn't eliminate it. I've refactored such that the Policy now provides a Comparator as opposed to extending it, so there is now room for it to provide multiple comparators and handle subtypes if need be, but I think we should wait until we see that we must do that before doing so, as I don't believe we will end up needing to (but if we do, existing code should need little change, and implementing what you suggest should be essentially additive...) bq. ...About inherit relationships between interfaces/classes... Policies will be composed to achieve combined capabilities yet the collection of
[jira] [Updated] (YARN-3318) Create Initial OrderingPolicy Framework
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3318: -- Summary: Create Initial OrderingPolicy Framework (was: Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior) Create Initial OrderingPolicy Framework --- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch Create the initial framework required for using OrderingPolicies -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3318: -- Description: Create the initial framework required for using OrderingPolicies (was: Create the initial framework required for using OrderingPolicies with SchedulerApplicaitonAttempts and integrate with the CapacityScheduler. This will include an implementation which is compatible with current FIFO behavior.) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior --- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch Create the initial framework required for using OrderingPolicies -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3318) Create Initial OrderingPolicy Framework
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3318: -- Attachment: YARN-3318.45.patch Create Initial OrderingPolicy Framework --- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch Create the initial framework required for using OrderingPolicies -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3319) Implement a FairOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3319: -- Attachment: YARN-3319.45.patch Implement a FairOrderingPolicy -- Key: YARN-3319 URL: https://issues.apache.org/jira/browse/YARN-3319 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3319.13.patch, YARN-3319.14.patch, YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch, YARN-3319.45.patch Implement a FairOrderingPolicy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. The Policy will offer allocations to applications in a queue in order of least resources used, and preempt applications in reverse order (from most resources used). This will include conditional support for sizeBasedWeight style adjustment Optionally, based on a conditional configuration to enable sizeBasedWeight (default false), an adjustment to boost larger applications (to offset the natural preference for smaller applications) will adjust the resource usage value based on demand, dividing it by the below value: Math.log1p(app memory demand) / Math.log(2); In cases where the above is indeterminate (two applications are equal after this comparison), behavior falls back to comparison based on the application id, which is generally lexically FIFO for that comparison -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3319) Implement a FairOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3319: -- Description: Implement a FairOrderingPolicy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. The Policy will offer allocations to applications in a queue in order of least resources used, and preempt applications in reverse order (from most resources used). This will include conditional support for sizeBasedWeight style adjustment Optionally, based on a conditional configuration to enable sizeBasedWeight (default false), an adjustment to boost larger applications (to offset the natural preference for smaller applications) will adjust the resource usage value based on demand, dividing it by the below value: Math.log1p(app memory demand) / Math.log(2); In cases where the above is indeterminate (two applications are equal after this comparison), behavior falls back to comparison based on the application id, which is generally lexically FIFO for that comparison was: Implement a Fair Comparator for the Scheduler Comparator Ordering Policy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. The Policy will offer allocations to applications in a queue in order of least resources used, and preempt applications in reverse order (from most resources used). This will include conditional support for sizeBasedWeight style adjustment An implementation of a Scheduler Comparator for use with the Scheduler Comparator Ordering Policy will be built with the below comparison for ordering applications for container assignment (ascending) and for preemption (descending) Current resource usage - less usage is lesser Submission time - earlier is lesser Optionally, based on a conditional configuration to enable sizeBasedWeight (default false), an adjustment to boost larger applications (to offset the natural preference for smaller applications) will adjust the resource usage value based on demand, dividing it by the below value: Math.log1p(app memory demand) / Math.log(2); In cases where the above is indeterminate (two applications are equal after this comparison), behavior falls back to comparison based on the application name, which is lexically FIFO for that comparison (first submitted is lesser) Implement a FairOrderingPolicy -- Key: YARN-3319 URL: https://issues.apache.org/jira/browse/YARN-3319 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3319.13.patch, YARN-3319.14.patch, YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch Implement a FairOrderingPolicy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. The Policy will offer allocations to applications in a queue in order of least resources used, and preempt applications in reverse order (from most resources used). This will include conditional support for sizeBasedWeight style adjustment Optionally, based on a conditional configuration to enable sizeBasedWeight (default false), an adjustment to boost larger applications (to offset the natural preference for smaller applications) will adjust the resource usage value based on demand, dividing it by the below value: Math.log1p(app memory demand) / Math.log(2); In cases where the above is indeterminate (two applications are equal after this comparison), behavior falls back to comparison based on the application id, which is generally lexically FIFO for that comparison -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3319) Implement a FairOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3319: -- Attachment: YARN-3319.47.patch Implement a FairOrderingPolicy -- Key: YARN-3319 URL: https://issues.apache.org/jira/browse/YARN-3319 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3319.13.patch, YARN-3319.14.patch, YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch, YARN-3319.45.patch, YARN-3319.47.patch Implement a FairOrderingPolicy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. The Policy will offer allocations to applications in a queue in order of least resources used, and preempt applications in reverse order (from most resources used). This will include conditional support for sizeBasedWeight style adjustment Optionally, based on a conditional configuration to enable sizeBasedWeight (default false), an adjustment to boost larger applications (to offset the natural preference for smaller applications) will adjust the resource usage value based on demand, dividing it by the below value: Math.log1p(app memory demand) / Math.log(2); In cases where the above is indeterminate (two applications are equal after this comparison), behavior falls back to comparison based on the application id, which is generally lexically FIFO for that comparison -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3318) Create Initial OrderingPolicy Framework and FifoOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3318: -- Attachment: YARN-3318.47.patch Create Initial OrderingPolicy Framework and FifoOrderingPolicy -- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch, YARN-3318.47.patch Create the initial framework required for using OrderingPolicies and an initial FifoOrderingPolicy -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3318) Create Initial OrderingPolicy Framework and FifoOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3318: -- Attachment: YARN-3318.48.patch javac error looks bogus, existing error has simply moved findbugs looks bogus, class it's complaining about is static. uploading new version so see if it notices now TestFairScheduler passes on my box with the patch, and can't see any way it would be effected. Tests will rerun with new patch, so we'll see. Create Initial OrderingPolicy Framework and FifoOrderingPolicy -- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch, YARN-3318.47.patch, YARN-3318.48.patch Create the initial framework required for using OrderingPolicies and an initial FifoOrderingPolicy -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3319) Implement a Fair SchedulerOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3319: -- Attachment: YARN-3319.39.patch Implement a Fair SchedulerOrderingPolicy Key: YARN-3319 URL: https://issues.apache.org/jira/browse/YARN-3319 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3319.13.patch, YARN-3319.14.patch, YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch Implement a Fair Comparator for the Scheduler Comparator Ordering Policy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. The Policy will offer allocations to applications in a queue in order of least resources used, and preempt applications in reverse order (from most resources used). This will include conditional support for sizeBasedWeight style adjustment An implementation of a Scheduler Comparator for use with the Scheduler Comparator Ordering Policy will be built with the below comparison for ordering applications for container assignment (ascending) and for preemption (descending) Current resource usage - less usage is lesser Submission time - earlier is lesser Optionally, based on a conditional configuration to enable sizeBasedWeight (default false), an adjustment to boost larger applications (to offset the natural preference for smaller applications) will adjust the resource usage value based on demand, dividing it by the below value: Math.log1p(app memory demand) / Math.log(2); In cases where the above is indeterminate (two applications are equal after this comparison), behavior falls back to comparison based on the application name, which is lexically FIFO for that comparison (first submitted is lesser) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3318: -- Attachment: YARN-3318.39.patch Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior --- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, YARN-3318.36.patch, YARN-3318.39.patch Create the initial framework required for using OrderingPolicies with SchedulerApplicaitonAttempts and integrate with the CapacityScheduler. This will include an implementation which is compatible with current FIFO behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392211#comment-14392211 ] Craig Welch commented on YARN-3318: --- [~leftnoteasy] SchedulerProcessEvents replaced with containerAllocated and containerReleased Serial and SerialEpoch replaced with compareInputOrderTo(), which is the option 2 for addressing it which we settled on offline Added addSchedulerProcess/removeSchedulerProcess/addAllSchedulerProcesses Changed configuration so that yarn.scheduler.capacity.root.default.ordering-policy=fair will setup the fair configuration, fifo will setup fifo, fair+fifo will setup compound fair + fifo, etc. It is possible to setup a custom ordering policy class using a different configuration, but the base one will handle the friendly setup. [~vinodkv] bq. It is not entirely clear how the ordering and limits work together - as one policy with multiple facets or multiple policy types This should be modeled as different types of policies, so that they can each focus on their particular purpose and avoid a repetition of the intermingling which has made it difficult to mix, match, and share capabilities. Having multiple policy types is essential to make it easy to combine them as needed. bq. let's split the patch that exposes this to the client side / web UI and in the API records into its own JIRA...premature to support this as a publicly supportable configuration... The goal is to make this available quickly but iteratively, keeping the changes small but making them available for use and feedback. Clearly we can mark things unstable, communicate that they are not fully mature/subject to change/should be used gently, but we will need to make it possible to activate the feature and use it in order to accomplish the use and feedback. We should grow it organically, gradually, iteratively, think of it is a facet of the policy framework hooked up and available but with more to follow bq. ...SchedulableEntity better... well, I'd actually talked [~leftnoteasy] into SchedulerProcess :-) So, we can chew on this a bit more see where we go bq. You add/remove applications to/from LeafQueue's policy but addition/removal of containers is an event... This has been factored differently along [~leftnoteasy]'s suggestion, it should now be consistent bq. The notion of a comparator doesn't make sense to an admin. It is simply a policy... Have modeled policy configuration differently, so comparator is out of sight (see above). bq. Depending on how ordering and limits come together, they may become properties of a policy I expect them to be distinct, this is specifically an ordering-policy, limits will be other types of limit-policy(ies) patch with these changes to follow in a few... Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior --- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, YARN-3318.36.patch Create the initial framework required for using OrderingPolicies with SchedulerApplicaitonAttempts and integrate with the CapacityScheduler. This will include an implementation which is compatible with current FIFO behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3293) Track and display capacity scheduler health metrics in web UI
[ https://issues.apache.org/jira/browse/YARN-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393194#comment-14393194 ] Craig Welch commented on YARN-3293: --- Hey [~vvasudev], it seems that the patch doesn't apply cleanly, can you update to latest trunk? Track and display capacity scheduler health metrics in web UI - Key: YARN-3293 URL: https://issues.apache.org/jira/browse/YARN-3293 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: Screen Shot 2015-03-30 at 4.30.14 PM.png, apache-yarn-3293.0.patch, apache-yarn-3293.1.patch, apache-yarn-3293.2.patch It would be good to display metrics that let users know about the health of the capacity scheduler in the web UI. Today it is hard to get an idea if the capacity scheduler is functioning correctly. Metrics such as the time for the last allocation, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3293) Track and display capacity scheduler health metrics in web UI
[ https://issues.apache.org/jira/browse/YARN-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393587#comment-14393587 ] Craig Welch commented on YARN-3293: --- General - it looks like the counters could possibly overflow and provide negative values, perhaps this is not something which could possibly happen in the lifetime of a cluster, but a large long-running cluster, is it a possiblilty/concern? This presently looks to be capasched only, had a suggestion to make slightly more general below, [~vinodkv] also mentioned not specific to scheduler, perhaps it's fine to go capasched only for the first iteration, but wanted to verify (perhaps we need a followon jira for other schedulers). on the web page It's a nit, but I find I don't like the look of the / between the counter and the resource expression where that occurs, maybe - instead of / for those (allocations/reservations/releases)? TestSchedulerHealth can we import Nodemanager get rid of package references in code CapacitySchedulerHealthInfo looks like there is no need to keep a reference to the CapacityScheduler instance after construction, can we drop it from being a member then? looks like line changes in info log are just whitespace, can you drop them? LeafQueue L884 looks to be just whitespace, can you revert? CSAssignment I think that there should be a new, gsharable between schedulers class which incorporates all the new assignment info and that it should be a member of CSAssignment, instead of adding all of the details directly to CSAssignment. You would still pack the info into CSAssignment (as an instance of that type), but now would take a form that can be shared across schedulers Track and display capacity scheduler health metrics in web UI - Key: YARN-3293 URL: https://issues.apache.org/jira/browse/YARN-3293 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: Screen Shot 2015-03-30 at 4.30.14 PM.png, apache-yarn-3293.0.patch, apache-yarn-3293.1.patch, apache-yarn-3293.2.patch It would be good to display metrics that let users know about the health of the capacity scheduler in the web UI. Today it is hard to get an idea if the capacity scheduler is functioning correctly. Metrics such as the time for the last allocation, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388275#comment-14388275 ] Craig Welch commented on YARN-3318: --- Hi Wangda, I have changed the patch a bit in the background without updating it on the jira. The changes are not major but I think they render some of the comments obsolete. I've uploaded the up-to-date patch just now, before doing so I took a pass through your comments- below I'll respond to each in turn- bq. 1) SchedulerProcess bq. 1.1. ...name seems not very straightforward to me... Well, I'm certainly open to other name options, but I do prefer SchedulerProcess to SchedulableEntity - it's common to refer the items which a scheduler will schedule as Processes, which is what these are in this case, which is why I chose this name. Entity is really very generic and empty of meaning. I do wish to avoid confusion with Scheduleable (and wasn't enamored of that name either...), I expect that as integration progresses there will be a period where Schedulable will be an extension of SchedulerProcess with the (remaining) FairScheduler specific bits (which will, I think, ultimately be incorporated in some way into SchedulerProcess, but that's down the line/should be addressed in further iteration). In any case, not in favor of adding Entity, I think when you consider the terminology as explained above SchedulerProcess works, try it on and see, and feel free to give other options... bq. 1.2. ...SchedulerProcessEvent...asynchronized Not all event handling must be asynchronous. I believe the details regarding this were spelled out reasonably well in the interface definition - if you take a peek at how these events are handled in the capacity scheduler configuration you will see that they are synchronously/safely within protection against mutation of the schedulerprocess collection. My goal was precisely to avoid needing to have implementer add a new method implementation every time a new event comes into play which may not be of interest to it, this makes maintenance of implementations easier - they can manage those which they understand and have appropriate default logic otherwise. I think this is a classic case for an enumerated set of events to be handled by the interface so I think it should be modeled as it is as opposed to adding a new method for each new event type to the interface itself... bq. 1.3 ...SerialEpoch Yeah, I don't like the names much either, I've gone through several versions and come to the conclusion that it's not the choice of names that's the problem. This is an attempt to hide the application id's while also exposing them, which is compound in nature, and is made stranger by the fact that this is totally irrelevant for other potential future implementors (such as queues). I want to factor it differnently not just change the names, these are the courses I'm considering: 1. Have SchedulerProcess implement Comparable and provide a natural ordering, which is fifo for apps. This seems to privilege fifo but, as a matter of fact, it's the fallback for fair so I'm not sure that's really an inappropriate thing to do - it seems like it is the natural ordering for apps. Other things can give their own natural ordering (queues - the hierarchy...), so it should extend reasonably well without the current awkwardness. This would remove all of getSerialEpoch, getSerial, and getId in favor of just implementing compareTo from comparable. The downsides I see are the privilage and that if an implementor of SchedulerProcess implemented comparable in an unworkable fashion it would be an issue, not the case for what we are presently looking at supporting afaik 2. Have an explicit compareCreationOrder(SchedulerProcess other) method which returns 0 + - like compareTo. This is much like 1, but removes the privilege and the possible Comparable collision... this also does away getSerialEpoch, getSerial, and getId in favor of the comparison method. What do you think of these options? Preference? BTW, FS has an actual startTime for fsappattempts, but looking through it I don't like that approach - it doesn't appear to do the right thing in some cases (like rm failover or recovery), it still can be ambiguous for some cases (simultanious start w/in tsmillis granularity) where there's a fallback to appid, so it doesn't look to really add anything as you still have to be able to fall back to the app id for those cases so you can't get away from the issue, and it adds a bit of complexity to boot. bq. 1.4 ...currentConsumption is not enough to make choice, demand(pending-resource)/used and priority/weight are basic fields of a Schedulable, do you think so... Of those, only demand is required for the initial step of supporting application level fairness when sizeBasedWeight is active, the others are only
[jira] [Updated] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3318: -- Attachment: YARN-3318.35.patch Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior --- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch Create the initial framework required for using OrderingPolicies with SchedulerApplicaitonAttempts and integrate with the CapacityScheduler. This will include an implementation which is compatible with current FIFO behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3319) Implement a Fair SchedulerOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3319: -- Attachment: YARN-3319.35.patch Apply after applying YARN-3318.35.patch Implement a Fair SchedulerOrderingPolicy Key: YARN-3319 URL: https://issues.apache.org/jira/browse/YARN-3319 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3319.13.patch, YARN-3319.14.patch, YARN-3319.17.patch, YARN-3319.35.patch Implement a Fair Comparator for the Scheduler Comparator Ordering Policy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. The Policy will offer allocations to applications in a queue in order of least resources used, and preempt applications in reverse order (from most resources used). This will include conditional support for sizeBasedWeight style adjustment An implementation of a Scheduler Comparator for use with the Scheduler Comparator Ordering Policy will be built with the below comparison for ordering applications for container assignment (ascending) and for preemption (descending) Current resource usage - less usage is lesser Submission time - earlier is lesser Optionally, based on a conditional configuration to enable sizeBasedWeight (default false), an adjustment to boost larger applications (to offset the natural preference for smaller applications) will adjust the resource usage value based on demand, dividing it by the below value: Math.log1p(app memory demand) / Math.log(2); In cases where the above is indeterminate (two applications are equal after this comparison), behavior falls back to comparison based on the application name, which is lexically FIFO for that comparison (first submitted is lesser) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3318: -- Attachment: YARN-3318.36.patch Fixes for release audit warnings Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior --- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, YARN-3318.36.patch Create the initial framework required for using OrderingPolicies with SchedulerApplicaitonAttempts and integrate with the CapacityScheduler. This will include an implementation which is compatible with current FIFO behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389563#comment-14389563 ] Craig Welch commented on YARN-3318: --- The remaining javac error doesn't appear to be related to my changes, which is confusing. On the next patch will have a change to try and address it anyway. TestRM passes on my box, I assume it's a transient issue. Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior --- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, YARN-3318.36.patch Create the initial framework required for using OrderingPolicies with SchedulerApplicaitonAttempts and integrate with the CapacityScheduler. This will include an implementation which is compatible with current FIFO behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389583#comment-14389583 ] Craig Welch commented on YARN-3318: --- BTW, can't just do lexical sort on the string version of application id - one problem with using the lexical compare on appid, the format for the id component is a min of 4 digits, which means that going from to 1 will result in incorrect lexical sort wrt to actual order Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior --- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, YARN-3318.36.patch Create the initial framework required for using OrderingPolicies with SchedulerApplicaitonAttempts and integrate with the CapacityScheduler. This will include an implementation which is compatible with current FIFO behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3318: -- Attachment: YARN-3318.34.patch Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior --- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch, YARN-3318.34.patch Create the initial framework required for using OrderingPolicies with SchedulerApplicaitonAttempts and integrate with the CapacityScheduler. This will include an implementation which is compatible with current FIFO behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3319) Implement a Fair SchedulerOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3319: -- Description: Implement a Fair Comparator for the Scheduler Comparator Ordering Policy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. The Policy will offer allocations to applications in a queue in order of least resources used, and preempt applications in reverse order (from most resources used). This will include conditional support for sizeBasedWeight style adjustment An implementation of a Scheduler Comparator for use with the Scheduler Comparator Ordering Policy will be built with the below comparison for ordering applications for container assignment (ascending) and for preemption (descending) Current resource usage - less usage is lesser Submission time - earlier is lesser Optionally, based on a conditional configuration to enable sizeBasedWeight (default false), an adjustment to boost larger applications (to offset the natural preference for smaller applications) will adjust the resource usage value based on demand, dividing it by the below value: Math.log1p(app memory demand) / Math.log(2); In cases where the above is indeterminate (two applications are equal after this comparison), behavior falls back to comparison based on the application name, which is lexically FIFO for that comparison (first submitted is lesser) was:Implement a Fair SchedulerOrderingPolicy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. Implement a Fair SchedulerOrderingPolicy Key: YARN-3319 URL: https://issues.apache.org/jira/browse/YARN-3319 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3319.13.patch, YARN-3319.14.patch, YARN-3319.17.patch Implement a Fair Comparator for the Scheduler Comparator Ordering Policy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. The Policy will offer allocations to applications in a queue in order of least resources used, and preempt applications in reverse order (from most resources used). This will include conditional support for sizeBasedWeight style adjustment An implementation of a Scheduler Comparator for use with the Scheduler Comparator Ordering Policy will be built with the below comparison for ordering applications for container assignment (ascending) and for preemption (descending) Current resource usage - less usage is lesser Submission time - earlier is lesser Optionally, based on a conditional configuration to enable sizeBasedWeight (default false), an adjustment to boost larger applications (to offset the natural preference for smaller applications) will adjust the resource usage value based on demand, dividing it by the below value: Math.log1p(app memory demand) / Math.log(2); In cases where the above is indeterminate (two applications are equal after this comparison), behavior falls back to comparison based on the application name, which is lexically FIFO for that comparison (first submitted is lesser) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3319) Implement a Fair SchedulerOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3319: -- Attachment: YARN-3319.17.patch With support for configuration via the scheduler's config file Implement a Fair SchedulerOrderingPolicy Key: YARN-3319 URL: https://issues.apache.org/jira/browse/YARN-3319 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3319.13.patch, YARN-3319.14.patch, YARN-3319.17.patch Implement a Fair SchedulerOrderingPolicy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3318: -- Attachment: YARN-3318.17.patch With support for configuration via the scheduler's config file Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior --- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch Create the initial framework required for using OrderingPolicies with SchedulerApplicaitonAttempts and integrate with the CapacityScheduler. This will include an implementation which is compatible with current FIFO behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3306) [Umbrella] Proposing per-queue Policy driven scheduling in YARN
[ https://issues.apache.org/jira/browse/YARN-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360652#comment-14360652 ] Craig Welch commented on YARN-3306: --- Thanks for your thoughts, [~kasha] The immediate proposal is to begin adding new functionality in a fashion which can be easily shared across scheduler implementations and mixed together in a single cluster. The first case is to support additional container assignment and preemption types to fifo for applications in the capacity scheduler and potentially the fair scheduler using the same code, but this is expected to be expanded to cover queue relationships and potentially other behaviors (limits, etc) over time. The hope is that this allows us to iterate toward a state where the various behaviors of the schedulers can be mixed, matched, and shared across implementations rather than having to try and accomplish this all in one go, and allows us to achieve the benefit of mixing and matching some of the features earlier/along the way. I suspect that at some point we'll hit a critical mass where enough of the functionality has been extracted to sharable components and where we've been able to establish an understanding of how these can be made to compose well, and then we'll take that as an inflection point and go down the path you are suggesting, introduce a new scheduler to house the policies and in that way complete the picture, deprecating the others. That's by no means the only possible conclusion, but it seems to be a good and/or likely one. [Umbrella] Proposing per-queue Policy driven scheduling in YARN --- Key: YARN-3306 URL: https://issues.apache.org/jira/browse/YARN-3306 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Attachments: PerQueuePolicydrivenschedulinginYARN.pdf Scheduling layout in Apache Hadoop YARN today is very coarse grained. This proposal aims at converting today’s rigid scheduling in YARN to a perÂ-queue policy driven architecture. We propose the creation of a c​ommon policy framework​ and implement a​common set of policies​ that administrators can pick and chose per queue - Make scheduling policies configurable per queue - Initially, we limit ourselves to a new type of scheduling policy that determines the ordering of applications within the l​eaf Âqueue - In the near future, we will also pursue parent queue level policies and potential algorithm reuse through a separate type of policies that control resource limits per queue, user, application etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353397#comment-14353397 ] Craig Welch commented on YARN-2495: --- -re bq. How about we simply things? Instead of accepting labels on both registration and heartbeat, why not restrict it to be just during registration? As I understand the requirements, it's necessary to handle the case where the derived set of labels changes during the lifetime of the nodemanager, e.g. externally libraries might be installed or some other condition may change which effects the labels no nodemanager re-registration is involved, and yet the changed labels need to be reflected Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or using script suggested by [~aw] (YARN-2729) ) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior
Craig Welch created YARN-3318: - Summary: Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Create the initial framework required for using OrderingPolicies with SchedulerApplicaitonAttempts and integrate with the CapacityScheduler. This will include an implementation which is compatible with current FIFO behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch reassigned YARN-3318: - Assignee: Craig Welch Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior --- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Create the initial framework required for using OrderingPolicies with SchedulerApplicaitonAttempts and integrate with the CapacityScheduler. This will include an implementation which is compatible with current FIFO behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3319) Implement a Fair SchedulerOrderingPolicy
Craig Welch created YARN-3319: - Summary: Implement a Fair SchedulerOrderingPolicy Key: YARN-3319 URL: https://issues.apache.org/jira/browse/YARN-3319 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Implement a Fair SchedulerOrderingPolicy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3318: -- Attachment: YARN-3318.13.patch Initial, incomplete patch with the overall framework implementation of the SchedulerComparatorPolicy and FifoComparator, major TODO includes integrating with capacity scheduler configuration. Also includes a CompoundComparator for chaining comparator based policies where desired. Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior --- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch Create the initial framework required for using OrderingPolicies with SchedulerApplicaitonAttempts and integrate with the CapacityScheduler. This will include an implementation which is compatible with current FIFO behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353965#comment-14353965 ] Craig Welch commented on YARN-3318: --- The proposed initial implementation of the framework to support FIFO SchedulerApplicationAttempt ordering for the CapacityScheduler: A SchedulerComparatorPolicy which implements OrderingPolicy above. This implementation will take care of the common logic required for cases where the policy can be effectively implemented as a comparator (which is expected to be the case for several potential policies, including FIFO). A SchedulerComparator which is used by the SchedulerComparatorPolicy above. This is an extension of the java Comparator interface with additional logic required by the SchedulerComparatorPolicy, initially a method to accept SchedulerProcessEvents and indicate whether the require re-ordering of the associated SchedulerProcess. Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior --- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Create the initial framework required for using OrderingPolicies with SchedulerApplicaitonAttempts and integrate with the CapacityScheduler. This will include an implementation which is compatible with current FIFO behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353953#comment-14353953 ] Craig Welch commented on YARN-3318: --- Proposed elements of the framework: A SchedulerProcess interface which generalizes processes to be managed by the OrderingPolicy (initially, potentially in the future by other Policies as well) Initial implementer will be the SchedulerApplicaitonAttempt. An OrderingPolicy interface which exposes a collection of scheduler processes which will be ordered by the policy for container assignment and preemption. The ordering policy will provide one Iterator which presents processes in the policy specific order for container assignment and another Iterator which presents them in the proper order for preemption. It will also accept SchedulerProcessEvents which may indicate a need to re-order the associated SchedulerProcess (for example, after container completion, preemption, assignment, etc) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior --- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Create the initial framework required for using OrderingPolicies with SchedulerApplicaitonAttempts and integrate with the CapacityScheduler. This will include an implementation which is compatible with current FIFO behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3319) Implement a Fair SchedulerOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14354006#comment-14354006 ] Craig Welch commented on YARN-3319: --- Initially this will be implemented for SchedulerApplicationAttempts in the CapacityScheduler LeafQueue (similar to the FIFO implementation in [YARN-3318]). The expectation is that this will be implement the SchedulerComparator interface and will be used as a comparator within the SchedulerComparatorPolicy implementation to achieve the intended behavior. Implement a Fair SchedulerOrderingPolicy Key: YARN-3319 URL: https://issues.apache.org/jira/browse/YARN-3319 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Implement a Fair SchedulerOrderingPolicy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3319) Implement a Fair SchedulerOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3319: -- Attachment: YARN-3319.13.patch Attaching initial/incomplete patch, it depends on the [YARN-3318] patch of the same index - it is just the additional logic specific to Fairness. Major TODO, sizeBasedWeight. Implement a Fair SchedulerOrderingPolicy Key: YARN-3319 URL: https://issues.apache.org/jira/browse/YARN-3319 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3319.13.patch Implement a Fair SchedulerOrderingPolicy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3320) Support a Priority OrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14354035#comment-14354035 ] Craig Welch commented on YARN-3320: --- The initial intent is to bring the appropriate parts of the implementation of ApplicationPriorities from [YARN-2004] into the OrderingPolicy framework as a SchedulerComparator which can be composed with Fair and Fifo comparators to achieve Fair and Fifo behavior WITHIN priority bands Support a Priority OrderingPolicy - Key: YARN-3320 URL: https://issues.apache.org/jira/browse/YARN-3320 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch When [YARN-2004] is complete, bring relevant logic into the OrderingPolicy framework -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3318: -- Attachment: YARN-3318.14.patch Same as .13 except it should be possible to apply with [YARN-3319] 's .14 patch Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior --- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch Create the initial framework required for using OrderingPolicies with SchedulerApplicaitonAttempts and integrate with the CapacityScheduler. This will include an implementation which is compatible with current FIFO behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3320) Support a Priority OrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3320: -- Summary: Support a Priority OrderingPolicy (was: Support a Priority SchedulerOrderingPolicy composible with Fair and Fifo ordering) Support a Priority OrderingPolicy - Key: YARN-3320 URL: https://issues.apache.org/jira/browse/YARN-3320 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch When [YARN-2004] is complete, bring relevant logic into the OrderingPolicy framework -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3320) Support a Priority SchedulerOrderingPolicy composible with Fair and Fifo ordering
Craig Welch created YARN-3320: - Summary: Support a Priority SchedulerOrderingPolicy composible with Fair and Fifo ordering Key: YARN-3320 URL: https://issues.apache.org/jira/browse/YARN-3320 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch When [YARN-2004] is complete, bring relevant logic into the OrderingPolicy framework -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3319) Implement a Fair SchedulerOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3319: -- Attachment: YARN-3319.14.patch Same as .13, except it should be possible to apply this patch after applying [YARN-3318] 's .14 patch Implement a Fair SchedulerOrderingPolicy Key: YARN-3319 URL: https://issues.apache.org/jira/browse/YARN-3319 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3319.13.patch, YARN-3319.14.patch Implement a Fair SchedulerOrderingPolicy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338848#comment-14338848 ] Craig Welch commented on YARN-3251: --- Sorry if that wasn't clear, to reduce risk removed the minor changes in CSQueueUtils CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1) Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Craig Welch Priority: Blocker Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, YARN-3251.2-6-0.3.patch The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3251: -- Attachment: YARN-3251.2-6-0.3.patch Removing the csqueueutils CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1) Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Craig Welch Priority: Blocker Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, YARN-3251.2-6-0.3.patch The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3251: -- Attachment: YARN-3251.2-6-0.4.patch Minor, switch to Internal, seems to be more common in the codebase CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1) Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Craig Welch Priority: Blocker Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, YARN-3251.2-6-0.3.patch, YARN-3251.2-6-0.4.patch The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3251: -- Attachment: YARN-3251.2.patch Attaching an analogue of the most recent patch against trunk. I do not believe that we will be committing this at this point as [~leftnoteasy] is working on a more significant change which will remove the need for it, but I wanted to make it available just in case. For clarity, patch against trunk is YARN-3251.2.patch and the patch to commit against 2.6 is YARN-3251.2-6-0.4.patch. CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1) Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Craig Welch Priority: Blocker Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, YARN-3251.2-6-0.3.patch, YARN-3251.2-6-0.4.patch, YARN-3251.2.patch The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3251: -- Attachment: YARN-3251.2-6-0.2.patch Patch against branch-2.6.0 CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1) Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Craig Welch Priority: Blocker Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337962#comment-14337962 ] Craig Welch commented on YARN-3251: --- bq. 1) Since the target of your patch is to make a quick fix for old version, it's better to create a patch in branch-2.6 done bq. And patch I'm working on now will remove the CSQueueUtils.computeMaxAvailResource, so it's no need to add a intermediate fix in branch-2. I suppose that depends on whether anyone needs a trunk version of the patch before the other changes are landed - if someone asks for it I could quickly update the original patch to provide it bq. 2) I think CSQueueUtils.getAbsoluteMaxAvailCapacity doesn't hold child/parent's lock together, maybe we don't need to change that, could you confirm? it doesn't, the change there was to insure consistency for multiple values used from the queue, as previously it was occurring inside a lock and that was guaranteed, now it isn't. However, there's no need to lock on the parent, so I removed that bq. 3) Maybe we don't need getter/setter of absoluteMaxAvailCapacity in queue, a volatile float is enough? Yes, that should be safe, done CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1) Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Craig Welch Priority: Blocker Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335765#comment-14335765 ] Craig Welch commented on YARN-3251: --- It looks like this can occur when a call which walks down the queue tree (in this case, getQueueInfo()) happens at the same time as an assignContainers call which does not start from the root queue, which is specifically one for a reservedContainer where scheduleAsynchronously is false. Essentially, it isn't safe to hold a lock on a queue while locking on a parent queue (as I now see noted in other methods in LeafQueue :/). [YARN-3243] is potentially a long term fix, but it would be nice to fix this right away as it clearly is already problematic. Also, [YARN-3243] depends on a number of other sizable changes which have gone in recently, meaning it will be difficult to apply it as a fix to older codebases, for which it would be very nice to have a fix. I've attached a patch somewhat along the lines suggest by [~sunilg], it simply moves the acquisition of the absoluteMaxAvailCapacity outside the lock on the leaf queue - it will lock parent queues individually as it ascends, but it never holds a parent and child lock simultaneously, which is the unacceptable state. It follows the pattern for other methods in LeafQueue like recoverContainer which access parent queues - they all are careful to make sure the parent queue access occurs outside any lock on themselves. Unfortunately it's not possible to just do this in root.assignContainers because of the reservedContainer case which will not invoke assignContainers on the root queue at any point. Instead, absoluteMaxAvailCapacity is determined outside any lock on the leaf queue in assignContainers before entering the synchronized method which continues the logic as it is today. This looks to me to be the way to fix the issue with the smallest code change today pending other changes coming down the line. CapacityScheduler deadlock when computing absolute max avail capacity - Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Wangda Tan Priority: Blocker The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3251: -- Attachment: YARN-3251.1.patch CapacityScheduler deadlock when computing absolute max avail capacity - Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Wangda Tan Priority: Blocker Attachments: YARN-3251.1.patch The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327716#comment-14327716 ] Craig Welch commented on YARN-2495: --- So, here's my proposal [~Naganarasimha] [~leftnoteasy], take a minute and consider whether or not DECENTRALIZED_CONFIGURATION_ENABLED is more likely to cause difficulty than prevent it, as I'm suggesting, and then you all can decide to keep it or not as you wish - I don't want to hold up the way forward over something which is, on the whole, a detail... Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or using script suggested by [~aw] (YARN-2729) ) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323076#comment-14323076 ] Craig Welch commented on YARN-2495: --- My point is that everything necessary to manage labels properly exists without DECENTRALIZED_CONFIGURATION_ENABLED, it is a duplication of existing functionality. The user controls this by: 1. choosing to specify or not specify a way of managing the nodes at the node manager 2. choosing to set or not set node labels and associations using the centralized apis ergo, DECENTRALIZED_CONFIGURATION_ENABLED is completely redundant, it provides no capabilities not already present. Users will need to understand how the feature works to use it effectively anyway, there is no value add by requiring that they repeat themselves (both by specifying a way of determining node labels at the node manager level and by having to set this switch.). My prediction is that, if the switch is present, it's chief function will be to confuse and annoy users when they setup a configuration for the node managers to generate node labels and then the labels don't appear in the cluster as they expect them to. Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or using script suggested by [~aw] (YARN-2729) ) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14297783#comment-14297783 ] Craig Welch commented on YARN-1039: --- [~chris.douglas] bq. YARN shouldn't understand the lifecycle for a service or the progress/dependencies for task containers That's not necessarily so, there are some cases where the type of life cycle for an application is important, for example, when determining whether or not it is open-ended (service) or a batch process which entails a notion of progress (session), at least for purposes of display. I think we need to re scope and clarify this jira a bit so that we can make progress - there are a number of items in the original problem statement and subsequent comments which have been taken on elsewhere and so really no longer make sense to pursue here. Here's an attempt at a breakdown: bq. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node I think this is now clearly covered by [YARN-796], nodes having qualities (including operational qualities such as these) is one of the core purposes of this work, it makes no sense to duplicate it here, and so it should be de-scoped from this jira bq. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node As [~ste...@apache.org] mentioned in an earlier comment [https://issues.apache.org/jira/browse/YARN-1039?focusedCommentId=14038041page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14038041] affinity / anti-affinity is covered in a more general sense in [YARN-1042]. The above component of this jira is really just such a case, and so it should be covered with that general solution and dropped from scope as well. There may be some interest in informing that solution based on a generalized service setting, but to really understand that the affinity approach needs to be worked out - and I think the affinity approach will really need to inform/integrate with this rather than the other way around, and integration should be approached as part of that effort That leaves nothing, so we can close the jira ;-) Not quite, there were several things added in comments: Token management - handled in [YARN-941] Scheduler hints not related to node categories or anti-affinity (opportunistic scheduling, etc) - this does strike me as something better handled via the duration route et all [YARN-2877] [YARN-1051] and not something which needs to be replicated here I think that really just leaves the progress bar (and potentially other display related items). This is covered by [YARN-1079] I suggest, then, that we either rescope this jira to providing the lifecycle information as an application tag [https://issues.apache.org/jira/browse/YARN-1039?focusedCommentId=14039679page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14039679] as suggested by [~zjshen] early on or close it and cover the work as part of [YARN-1079]. I originally objected to that approach on the basis that tags appeared to be a display type feature which did not fit this effort, but if re scoped as I'm proposing, it becomes such a feature, and I think that approach is now a good fit. Thoughts? Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294557#comment-14294557 ] Craig Welch commented on YARN-1039: --- [~chris.douglas] what's the proper duration for a service which does not have a pre-defined lifetime? This distinction is not really about how long will it run but more about what is the lifecycle of this app - as [~ste...@apache.org] points out, is it session or batch oriented (something which has a defined set of work, so it has a notion of progress to completion) or is it a running process with an indeterminate/unknown lifetime which handles whatever work is sent it's way (a service). This is really the distinction needed here - it's a qualitative difference regarding a lifecycle, the notion of an enumeration of lifecycle types makes sense for this. Users will often have no idea how long their application will run, but they will generally have a clear notion of it's lifecycle. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294441#comment-14294441 ] Craig Welch commented on YARN-2495: --- [~Naganarasimha] I understand the desire to have the feature, it does seem more like a convenience / simplification measure than an introduction of something that can't be otherwise accomplished, but convenience and simplification can matter a great deal, so why not :-) bq. There will be always confusion ... Do we need to And or OR What I'm getting at is that I think there are just too many switches and knobs in play once you consider the flag DECENTRALIZED_CONFIGURATION_ENABLED in addition to the other configuration relationships (defining the configuration script to do the update from nodes), I think that the act of configuring something to send node labels from the node manager is sufficient intent that it is the desired behavior, and the additional DECENTRALIZED_CONFIGURATION_ENABLED is just an extra ceiling for someone to bump their head against while setting this up. wrt supporting add vs replace behavior, I think that as it's described now the idea is to just support replace form the node script, meaning that it will effectively be the only definition used when it is active (which is fine for many cases). In the future, if there is a need for hybrid configuration of labels that can become an enhancement. An option would be to use a different parameter for a script which will do add and remove instead of replace, and then say have it return + or - (for add and remove) with the label instead of a fixed set of labels for replacement. From what I see above, the replacement approach, where the script determines the full label set, looks to be the immediate need - the other could be added in a compatible way later if it was needed. Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or using script suggested by [~aw] (YARN-2729) ) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284325#comment-14284325 ] Craig Welch commented on YARN-1039: --- Another thought - if we do need this kind of flag, I think we should detach the notion from duration or long life as such - I think it's more about service vs batch - where a service's duration is not necessarily related to any preset notion of a work item it will start, work on, and complete - it will be started to handle work which is given to it, of unknown quantity ( potentially many different items) and stopped when no longer needed - it's not so much about the duration as the lifecycle (a batch operation may have a longer runtime than a service, for example). So, I'd suggest dropping the temporal flavor and going with service vs batch, or something along those lines. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284304#comment-14284304 ] Craig Welch commented on YARN-1039: --- As I understand it (and, I may be wrong on this...) the original intent of this jira was to provide a boolean switch to control a set of behaviors expected to be important for a long running service - among other things, what sort of nodes to schedule on and how to handle logs. This could be on a sliding scale based on duration, but I'm not sure that works so well - at what duration do we start to change how we handle logs and / or where we schedule things? While related, I think that converting this from a boolean to a range will make it more difficult to use it for the intended usecase. I also think that packing together all of these behaviors into one parameter might be a negative overall. I do think, to [~john.jian.fang] 's point, as of now using this to determine where to schedule tasks to avoid spot instances and the like has really been superseded by Node Labels and I do not think we should add additional functionality for that here - Node Labels is really the way to handle that part of the usecase. That leaves, potentially among other things, affinity/anti-affinity issues (not scheduling long running tasks together/scheduling them together) and log handling (how do we tell the system we want log handling for a long running service, if, in fact, the system needs to be told that). I submit that it would be better to have separate solutions to each of these needs which can be bundled together to achieve the overall usecase, as I think that will provide better control without adding too much complexity for the end user. Which means that we would break this out into affinity/anti-affinity and logging configuration. We could always have a single parameter (like this one) which set's the others for convenience, I'm not sure we'll actually need it, but I do think that splitting out the bundled functionality into individual items (some of which may already be being worked on elsewhere) is the way to go. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14281058#comment-14281058 ] Craig Welch commented on YARN-1680: --- Thanks for the update, [~airbots], a couple thoughts: I created [YARN-2848] in the hopes that it would help us to build a solution which could share functionality between various items with similar needs, so that the solution we come up with is build with that in mind. That said, I think we will need to build the solutions independently, and there's no need to do them all at the same time. -re Every time, App asks for blacklist addition, we check whether the nodes in addition are in cluster blacklist or not (O(m), m is the nodes in blacklist addition). If so, remove this node from addition. Unfortunately, I don't think that this can be solved with checks during addition and removal - I believe that we will need to keep a persistent picture of all blacklisted nodes for an application regardless of their cluster state because the two can vary independently and changes after a blacklist request may invalidate things (for example, cluster blacklists just before app blacklists, the app blacklist request is discarded, the cluster reinstates but the app still cannot use the node for reasons different from the nodes cluster availability - we will still include that node in headroom incorrectly...). I also think that, as suggested in [YARN-2848], the only approach I see working for all states is one where there is a last-change indicator of some sort active for the cluster in terms of it's node composition which is held by the application and, when it has updated past the application's last calculation for app cluster resource (in this case, the one which omits blacklisted nodes), it re-evaluates state to determine a new app cluster resource which it then uses (until a reevaluation is required, again). This should enable the application to have accurate headroom information regardless of the timing of changes and allows for the more complex evaluations which may be needed (rack blacklisting, etc) while minimizing the frequency of those evaluations. I don't think it is necessarily required for blacklisting, but it's worth noting that this could include offloading some of the calculation to the application master (via more informational api's / library functions for calculation) to distribute the cost outward. Again, not necessarily for this case, but I wanted to mention it as I think it is an option now or later on. availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory. -- Key: YARN-1680 URL: https://issues.apache.org/jira/browse/YARN-1680 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0, 2.3.0 Environment: SuSE 11 SP2 + Hadoop-2.3 Reporter: Rohith Assignee: Chen He Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, YARN-1680-v2.patch, YARN-1680.patch There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster slow start is set to 1. Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is become unstable(3 Map got killed), MRAppMaster blacklisted unstable NodeManager(NM-4). All reducer task are running in cluster now. MRAppMaster does not preempt the reducers because for Reducer preemption calculation, headRoom is considering blacklisted nodes memory. This makes jobs to hang forever(ResourceManager does not assing any new containers on blacklisted nodes but returns availableResouce considers cluster free memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications.
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275700#comment-14275700 ] Craig Welch commented on YARN-2637: --- Regarding the findbugs report for LeafQueue.lastClusterResource - access to lastClusterResource appears to be synchronized everywhere except getAbsActualCapacity, which I don't actually see being used anywhere - I'm going to add a findbugs exception and a comment on the method so that if it is used in the future synchronization can be addressed -re [~leftnoteasy] 's latest: -re 1 - actually, user limits are based on absolute queue capacity rather than max capacity - this is apparently intentional because, although a queue can exceed it's absolute capacity, an individual user is not supposed to, hence my basing the user amlimit on the absolute capacity. The approach I use fits with the original logic in CSQueueUtils which allows a user the greater of the userlimit share of the absolute capacity or 1/# active users (so if there are fewer users active than would reach the userlimit they can use the full queue absolute capacity), the only correction being that we are using the actual value of resources by application masters instead of one based on minalloc -re 2 - Actually, the snippet provided is not quite correct, some schedulers provide a cpu value as well. In any case, for encapsulation reasons it's better to use the scheduler's value in case its means of determining this changes in the future. -re 3 - I can't see this making the slightest difference in understandability - since these test's paths don't populate the rmapps I would simply be individually putting mocked ones into the map instead of the single mock + matcher for all the apps. The way it is seems clearer to me as all of the mocking is together instead of distributing the (mock activity, if not mock framework...) process of putting mock rmapps into the collection throughout the test -re 4 - interesting, those were already there, but I also couldn't see why. Test passes fine without them, so I removed them -re 5 - removed uploading updated patch in a few maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications. Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.30.patch, YARN-2637.31.patch, YARN-2637.32.patch, YARN-2637.36.patch, YARN-2637.38.patch, YARN-2637.39.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications.
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2637: -- Attachment: YARN-2637.40.patch maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications. Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.30.patch, YARN-2637.31.patch, YARN-2637.32.patch, YARN-2637.36.patch, YARN-2637.38.patch, YARN-2637.39.patch, YARN-2637.40.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications.
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2637: -- Attachment: YARN-2637.39.patch Now with web ui entries max am and max am user resource + application limit tests maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications. Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.30.patch, YARN-2637.31.patch, YARN-2637.32.patch, YARN-2637.36.patch, YARN-2637.38.patch, YARN-2637.39.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications.
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2637: -- Attachment: YARN-2637.36.patch Should be down to one failing test, let's see maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications. Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.30.patch, YARN-2637.31.patch, YARN-2637.32.patch, YARN-2637.36.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications.
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2637: -- Attachment: YARN-2637.32.patch Check tests using absoluteCapacity for userAmLimit maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications. Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.30.patch, YARN-2637.31.patch, YARN-2637.32.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications.
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2637: -- Attachment: YARN-2637.31.patch See what happens when maxActiveApplications and maxActiveApplicationsPerUser are removed altogether maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications. Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.30.patch, YARN-2637.31.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2637: -- Attachment: YARN-2637.30.patch userAMLimit logic included as well, now with a test :-) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.30.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2637: -- Attachment: YARN-2637.27.patch patch tests which fail when null check for rmcontext.getscheduler is not present in ficaschedulerapp maximum-am-resource-percent could be violated when resource of AM is minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2637: -- Attachment: YARN-2637.28.patch maximum-am-resource-percent could be violated when resource of AM is minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, YARN-2637.28.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267030#comment-14267030 ] Craig Welch commented on YARN-2637: --- Findbugs was the result of changing the ratio of sync to unsync accesses which hit the findbugs limits, but not the pattern itself, which looks fine, so added fb exclusion. TestFairScheduler passes on my box with the change so build server related / not a real issue. Was not originally planning to address the max am percent for user as that wasn't the issue we kept encountering but forgot to mention this / edit the jira to reflect. However, I'm going to see what the impact would be of adding that now then we can decide to include it or move to it's own jira. maximum-am-resource-percent could be violated when resource of AM is minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, YARN-2637.28.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2637: -- Attachment: YARN-2637.29.patch Take a go adding user am limit also (needs further verification/test), see test impact maximum-am-resource-percent could be violated when resource of AM is minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2637: -- Attachment: YARN-2637.26.patch reformatted some sections of testleafqueue, commenting the null check for rmcontext.getscheduler in ficaschedulerapp to see how widespread that condition is in the tests. maximum-am-resource-percent could be violated when resource of AM is minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265480#comment-14265480 ] Craig Welch commented on YARN-2637: --- bq. Regarding null checks in FiCaSchedulerApp. Since scheduler assumes application is in running state when adding FiCaSchedulerApp. It is a big issue if RMApp cannot be found at that time. So comparing to just ignore such error, I think you need throw exception (if that exception will not cause RM shutdown) and log such error. I'm not quite sure how to phrase this differently to get the point across - it is already the case throughout the many mocking points which interact with this code that the rmapp may be null at this point (if it were not the case it would not be necessary to check for it). As I mentioned previously, the ResourceManager itself checks for this case. I am not introducing the mocking which resulted in this state, or even existing checks for it in non-test code, I'm receiving this state and carrying it forward in the same way as it has been done elsewhere (and, again, not simply in tests). Changing this is not something which belongs in the scope of this jira because it represents a rationalization/overhaul of mocking throughout this area (resource manager, schedulers), it is non-trivial and not specific to or properly within the scope of this change. Feel free to create a separate jira to improve the mocking throughout the code. The separate null-check for the amresourcerequest is necessitated by the apparently intentional behavior of unmanaged am's. bq. And when this is possible? + if (rmContext.getScheduler() != null) again, in existing test paths, and existing code is tolerant of this as well, I'm merely carrying it forward - it would belong in the new jira as well, were one opened bq. \t in leafqueue - I've checked and the spacing is consistent with the existing spacing in the file. maximum-am-resource-percent could be violated when resource of AM is minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.25.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2637: -- Attachment: YARN-2637.25.patch maximum-am-resource-percent could be violated when resource of AM is minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.25.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14263608#comment-14263608 ] Craig Welch commented on YARN-2637: --- bq. I think there should at least one AM can be launched in each queue ... MockRM test config settings That's been the case since switching to approach 2, some tests need to start 1 app in a queue ;) In any case, I've removed the MockRM test config settings, it's only needed in a few tests now, so I'm setting it those tests directly (done) bq. -re maximumActiveApplications ... MAXIMUM_ACTIVE_APPLICATIONS_SUFFIX I removed this new configuration point. It is no longer possible to directly control how many apps start in a queue since the AM's are not all the same size, so it's not possible to actually control that now outside of testing (it was before, not it's not). However, the cases I recall using that were all to work around the fact that the max am percent wasn't working properly, so hopefully this won't be missed (done) -re null checks in FiCaSchedulerApp constructor So, the ResourceManager itself checks for null rmapps (ResourceManager.java~ line 830), this is a pre-existing case which is tolerated and I'm not going to address it. The getAMResourceRequest() can also be null for unmanaged AM's. I've reduced the null checks for the app to just these two cases but those checks should remain. (partly done/remaining should stay as-is) All the build quality checks and tests are passing, not sure why the overall is red, think it's a build server issue... maximum-am-resource-percent could be violated when resource of AM is minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.25.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2637: -- Attachment: YARN-2637.23.patch maximum-am-resource-percent could be violated when resource of AM is minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2637: -- Attachment: YARN-2637.22.patch maximum-am-resource-percent could be violated when resource of AM is minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2637: -- Attachment: YARN-2637.21.patch maximum-am-resource-percent could be violated when resource of AM is minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2637: -- Attachment: YARN-2637.20.patch maximum-am-resource-percent could be violated when resource of AM is minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)