Chaoran Yu created YUNIKORN-704:
-----------------------------------

             Summary: Scheduling of DaemonSet pods may fail
                 Key: YUNIKORN-704
                 URL: https://issues.apache.org/jira/browse/YUNIKORN-704
             Project: Apache YuniKorn
          Issue Type: Bug
          Components: shim - kubernetes
            Reporter: Chaoran Yu
         Attachments: fluent-bit-describe.yaml, fluent-bit.yaml

We sometimes see DaemonSet pods fail to be scheduled. Please see attached files 
for the YAML and _kubectl describe_ output of one such pod. We originally 
suspected [node 
reservation|https://github.com/apache/incubator-yunikorn-core/blob/v0.10.0/pkg/scheduler/context.go#L41]
 was to blame. But even after setting the DISABLE_RESERVATION environment 
variable to true, we still see such scheduling failures. The issue is 
especially severe when K8s nodes have disk pressure that causes lots of pods to 
be evicted. Newly created pods will stay in pending forever. We have to 
temporarily uninstall YuniKorn and let the default scheduler do the scheduling 
for these pods. 

This issue is critical because lots of important pods belong to a DaemonSet, 
such as Fluent Bit, a common logging solution. This is probably the last 
remaining roadblock for us to have the confidence to have YuniKorn entirely 
replace the default scheduler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org

Reply via email to