[ https://issues.apache.org/jira/browse/YUNIKORN-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Manikandan R resolved YUNIKORN-2270. ------------------------------------ Fix Version/s: 1.5.0 Resolution: Fixed Merged to master > GPU Preemption is not triggered as expected when all available GPUs are used > ---------------------------------------------------------------------------- > > Key: YUNIKORN-2270 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2270 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler > Reporter: Weiwei Yang > Assignee: Weiwei Yang > Priority: Major > Labels: pull-request-available > Fix For: 1.5.0 > > > I am testing an important scenario of preemption for GPU. The design a > scenario is like the following: > queue structure is pretty simple: > {code} > root.a (min=100, max=300) > root.b (min=0, max=300) > {code} > the cluster has a total of 300 GPUs available, no autoscaling. Reproducing > steps: > 1. Create 600 pods in root.b queue, each needs 1 GPU. This will consume all > 300 GPUs available in the cluster, and 300 pods pending > 2. Create 100 pods in root.a queue, each needs 1 GPU. The expectation is > queue a will preempt 100 GPU from queue b reach the guarantee. > observation: a small number of pods preempted resources from queue b got > started on queue a, the result is not stable. it could not reach guaranteed > resources. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org