Hi everyone,
I’m running Flink 2.2 + K8s Operator 1.14 with the ForSt async state backend, 
checkpoints in S3, working state around 1TB. 

With upgradeMode last-state and using the adaptive scheduler we observe:
- Downscale rescales (e.g. parallelism 24 -> 12) take roughly 2 hours from spec 
change to RUNNING (no events are processed in this time
- Upscale rescales in the same flow are much faster, maybe 2-10 min. 

The relevant ForSt / checkpointing settings are:
- use-ingest-db-restore-mode: true
- incremental-restore-async-compact-after-rescale: true
- state-recovery.claim-mode: CLAIM 
- NATIVE savepoint format, incremental + compressed checkpoints
- pipeline.max-parallelism: 120
- ForSt cache size-based-limit 350GB on a 500Gi gp3 volume per TM. 
- I’m running one slot per TM and have tried parallelisms between 12 and 24

A few questions:
- Is ~2h for a downscale at this state size in line with what others see, or 
does it suggest a misconfiguration? Most of the time appears to be in restore.

- Is the asymmetry where upscales are
much faster also expected? 

- Given the asymmetry, are there settings or patterns that make autoscaling 
viable at ~1TB state beyond adjusting job.autoscaler.scale-down.max-factor and 
scale-down.interval? Or is the practical recommendation at this scale to keep 
parallelism static and rescale manually during low-traffic windows?

Sidenote: we also see OOMKilled containers periodically when per-TM state is 
large, despite managed.fraction 0.5 and 
kubernetes.taskmanager.memory.limit-factor 1.5. ForSt native memory appears to 
overshoot the managed budget. Any specific settings worth investigating before 
bumping pod memory?

Happy to share full config and metrics if useful.
Thanks,
Francis

Reply via email to