[jira] [Commented] (YARN-8149) Revisit behavior of Re-Reservation in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802684#comment-17802684 ] Shilun Fan commented on YARN-8149: -- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > Revisit behavior of Re-Reservation in Capacity Scheduler > > > Key: YARN-8149 > URL: https://issues.apache.org/jira/browse/YARN-8149 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Priority: Major > > Frankly speaking, I'm not sure why we need the re-reservation. The formula is > not that easy to understand: > Inside: > {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#shouldAllocOrReserveNewContainer}} > {code:java} > starvation = re-reservation / (#reserved-container * > (1 - min(requested-resource / max-alloc, > max-alloc - min-alloc / max-alloc)) > should_allocate = starvation + requiredContainers - reservedContainers > > 0{code} > I think we should be able to remove the starvation computation, just to check > requiredContainers > reservedContainers should be enough. > In a large cluster, we can easily overflow re-reservation to MAX_INT, see > YARN-7636. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8149) Revisit behavior of Re-Reservation in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471327#comment-16471327 ] Wangda Tan commented on YARN-8149: -- According to previous discussion, downgrade priority and removed 3.1.1 from target version. > Revisit behavior of Re-Reservation in Capacity Scheduler > > > Key: YARN-8149 > URL: https://issues.apache.org/jira/browse/YARN-8149 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Priority: Major > > Frankly speaking, I'm not sure why we need the re-reservation. The formula is > not that easy to understand: > Inside: > {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#shouldAllocOrReserveNewContainer}} > {code:java} > starvation = re-reservation / (#reserved-container * > (1 - min(requested-resource / max-alloc, > max-alloc - min-alloc / max-alloc)) > should_allocate = starvation + requiredContainers - reservedContainers > > 0{code} > I think we should be able to remove the starvation computation, just to check > requiredContainers > reservedContainers should be enough. > In a large cluster, we can easily overflow re-reservation to MAX_INT, see > YARN-7636. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8149) Revisit behavior of Re-Reservation in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16436488#comment-16436488 ] Wangda Tan commented on YARN-8149: -- [~tgraves], Preemption for large reserved container is already handled by existing code path, it won't guarantee all reserved container can be satisfied, but it can alleviate the problem a log: https://issues.apache.org/jira/browse/YARN-4390. I agree that we cannot remove this method in anytime soon (sadly), let's think more about how to better do reservation + preemption. I added moveReservedContainer (Swap) to CS part of YARN-5864. It is possible that we can consume that method to do better reservation. > Revisit behavior of Re-Reservation in Capacity Scheduler > > > Key: YARN-8149 > URL: https://issues.apache.org/jira/browse/YARN-8149 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Priority: Critical > > Frankly speaking, I'm not sure why we need the re-reservation. The formula is > not that easy to understand: > Inside: > {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#shouldAllocOrReserveNewContainer}} > {code:java} > starvation = re-reservation / (#reserved-container * > (1 - min(requested-resource / max-alloc, > max-alloc - min-alloc / max-alloc)) > should_allocate = starvation + requiredContainers - reservedContainers > > 0{code} > I think we should be able to remove the starvation computation, just to check > requiredContainers > reservedContainers should be enough. > In a large cluster, we can easily overflow re-reservation to MAX_INT, see > YARN-7636. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8149) Revisit behavior of Re-Reservation in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16436366#comment-16436366 ] Thomas Graves commented on YARN-8149: - thinking about this a little more, even with the current preemption on, I don't think preemption is smart enough to keep starvation from happening. If preemption was smart enough to kill enough containers on a reserved node to make it so the big container actually gets scheduled there that might be ok. But last time I checked it doesn't do that. Without that or having another way to prevent starvation I wouldn't want to remove this. I think adding a config would be alright but if anyone finds it useful you can't remove and would just be an extra config. If we have other ideas to simply or make this better, great we should look at. Or if there is a way for us to get stats on if this is useful we could add those and run and determine if we should remove. > Revisit behavior of Re-Reservation in Capacity Scheduler > > > Key: YARN-8149 > URL: https://issues.apache.org/jira/browse/YARN-8149 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Priority: Critical > > Frankly speaking, I'm not sure why we need the re-reservation. The formula is > not that easy to understand: > Inside: > {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#shouldAllocOrReserveNewContainer}} > {code:java} > starvation = re-reservation / (#reserved-container * > (1 - min(requested-resource / max-alloc, > max-alloc - min-alloc / max-alloc)) > should_allocate = starvation + requiredContainers - reservedContainers > > 0{code} > I think we should be able to remove the starvation computation, just to check > requiredContainers > reservedContainers should be enough. > In a large cluster, we can easily overflow re-reservation to MAX_INT, see > YARN-7636. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8149) Revisit behavior of Re-Reservation in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16436327#comment-16436327 ] Wangda Tan commented on YARN-8149: -- Thanks [~tgraves] for the suggestions. To your question: {quote}are you going to do anything with starvation then or allocation a certain % more then what is required? {quote} Not yet, {quote}Is in queue preemption on by default? {quote} No, but we see a large number of users / clusters enable this. Probably what we should do is make it configurable and test it in a large cluster, run for a long time, and remove it only if we're confident about it. > Revisit behavior of Re-Reservation in Capacity Scheduler > > > Key: YARN-8149 > URL: https://issues.apache.org/jira/browse/YARN-8149 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Priority: Critical > > Frankly speaking, I'm not sure why we need the re-reservation. The formula is > not that easy to understand: > Inside: > {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#shouldAllocOrReserveNewContainer}} > {code:java} > starvation = re-reservation / (#reserved-container * > (1 - min(requested-resource / max-alloc, > max-alloc - min-alloc / max-alloc)) > should_allocate = starvation + requiredContainers - reservedContainers > > 0{code} > I think we should be able to remove the starvation computation, just to check > requiredContainers > reservedContainers should be enough. > In a large cluster, we can easily overflow re-reservation to MAX_INT, see > YARN-7636. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8149) Revisit behavior of Re-Reservation in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16436295#comment-16436295 ] Thomas Graves commented on YARN-8149: - are you going to do anything with starvation then or allocation a certain % more then what is required? I am hesitant to remove this without doing some major testing. I haven't had a chance to look at the latest code to investigate. It might be more fine now that we do continue looking at other nodes after reservation where as originally that didn't happen. Is in queue preemption on by default? > Revisit behavior of Re-Reservation in Capacity Scheduler > > > Key: YARN-8149 > URL: https://issues.apache.org/jira/browse/YARN-8149 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Priority: Critical > > Frankly speaking, I'm not sure why we need the re-reservation. The formula is > not that easy to understand: > Inside: > {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#shouldAllocOrReserveNewContainer}} > {code:java} > starvation = re-reservation / (#reserved-container * > (1 - min(requested-resource / max-alloc, > max-alloc - min-alloc / max-alloc)) > should_allocate = starvation + requiredContainers - reservedContainers > > 0{code} > I think we should be able to remove the starvation computation, just to check > requiredContainers > reservedContainers should be enough. > In a large cluster, we can easily overflow re-reservation to MAX_INT, see > YARN-7636. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8149) Revisit behavior of Re-Reservation in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16436109#comment-16436109 ] Wangda Tan commented on YARN-8149: -- Thanks [~cheersyang] for pointing to the original Jira. I would say this could be more harmful than useful: re-reservation can be as large as MAX_INT, which means an app could reserve on many node even if the app has only one pending large resource request. With preemption enhancements like surgical preemption, etc. I think we don't need this any more. Still want to hear thoughts from others before taking action. > Revisit behavior of Re-Reservation in Capacity Scheduler > > > Key: YARN-8149 > URL: https://issues.apache.org/jira/browse/YARN-8149 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Priority: Critical > > Frankly speaking, I'm not sure why we need the re-reservation. The formula is > not that easy to understand: > Inside: > {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#shouldAllocOrReserveNewContainer}} > {code:java} > starvation = re-reservation / (#reserved-container * > (1 - min(requested-resource / max-alloc, > max-alloc - min-alloc / max-alloc)) > should_allocate = starvation + requiredContainers - reservedContainers > > 0{code} > I think we should be able to remove the starvation computation, just to check > requiredContainers > reservedContainers should be enough. > In a large cluster, we can easily overflow re-reservation to MAX_INT, see > YARN-7636. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8149) Revisit behavior of Re-Reservation in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16435550#comment-16435550 ] Weiwei Yang commented on YARN-8149: --- Hi [~leftnoteasy] I am sure if it is a good idea to remove this since this is some legend code there for years, looks like was added in MAPREDUCE-2917. The original formular was {code} starvation = (int)((application.getReReservations(priority) / reservedContainers) * + absoluteCapacity); {code} > Revisit behavior of Re-Reservation in Capacity Scheduler > > > Key: YARN-8149 > URL: https://issues.apache.org/jira/browse/YARN-8149 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Priority: Critical > > Frankly speaking, I'm not sure why we need the re-reservation. The formula is > not that easy to understand: > Inside: > {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#shouldAllocOrReserveNewContainer}} > {code:java} > starvation = re-reservation / (#reserved-container * > (1 - min(requested-resource / max-alloc, > max-alloc - min-alloc / max-alloc)) > should_allocate = starvation + requiredContainers - reservedContainers > > 0{code} > I think we should be able to remove the starvation computation, just to check > requiredContainers > reservedContainers should be enough. > In a large cluster, we can easily overflow re-reservation to MAX_INT, see > YARN-7636. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8149) Revisit behavior of Re-Reservation in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16434342#comment-16434342 ] Wangda Tan commented on YARN-8149: -- [~jlowe] / [~eepayne] / [~cheersyang] / [~Tao Yang] / [~sunilg]. Could u share your thoughts on this? If we can remove this, reservation logic can be simplified a lot. > Revisit behavior of Re-Reservation in Capacity Scheduler > > > Key: YARN-8149 > URL: https://issues.apache.org/jira/browse/YARN-8149 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Priority: Critical > > Frankly speaking, I'm not sure why we need the re-reservation. The formula is > not that easy to understand: > Inside: > {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#shouldAllocOrReserveNewContainer}} > {code:java} > starvation = re-reservation / (#reserved-container * > (1 - min(requested-resource / max-alloc, > max-alloc - min-alloc / max-alloc)) > should_allocate = starvation + requiredContainers - reservedContainers > > 0{code} > I think we should be able to remove the starvation computation, just to check > requiredContainers > reservedContainers should be enough. > In a large cluster, we can easily overflow re-reservation to MAX_INT, see > YARN-7636. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org