Hello,

I am running a flink job in the application mode on k8s. It's deployed as a
FlinkDeployment and its life-cycle is managed by the flink-k8s-operator.
The autoscaler is being used with the following config

job.autoscaler.enabled: true
job.autoscaler.metrics.window: 5m
job.autoscaler.stabilization.interval: 1m
job.autoscaler.target.utilization: 0.6
job.autoscaler.target.utilization.boundary: 0.2
pipeline.max-parallelism: 60
jobmanager.scheduler: adaptive

During a scale-up event, the autoscaler increases the parallelism of one of
the job vertex to a higher value. This triggers a bunch of new task
managers to be scheduled on the EKS cluster (The node-group has an attached
ASG). Out of all the requested TM pods only some get scheduled and then the
cluster runs out of resources. The other TM pods remain in the "pending
mode" indefinitely and the job is stuck in the "restart" loop forever.

1. Shouldn't the adaptive scheduler reduce the vertex parallelism due to
the slots/TMs not being available?
2. When I looked at the pods stuck in the pending state, I found them to be
reporting the following events:

│   Warning  FailedScheduling   4m55s (x287 over 23h)   default-scheduler   0/5
nodes are available: 1 Insufficient cpu, 1 node(s) didn't match Pod's node
affinity/selector, 3 Insufficient memory. preempti │

│ on: 0/5 nodes are available: 1 Preemption is not helpful for scheduling,
4 No preemption victims found for incoming pod.
                                                        │

│   Normal   NotTriggerScaleUp  3m26s (x8555 over 23h)  cluster-autoscaler  pod
didn't trigger scale-up: 1 max node group size reached

The WARN suggests that the "default scheduler" is being used. Why is that
the case even though the adaptive scheduler is configured to be used?

Appreciate it if you can shed some light on why this could be happening.

Thanks
Chetas

Reply via email to