Hi Team,

We have a session cluster running on K8 where multiple stateless jobs are
running fine. We observed that once we submit a stateful job (state size
per checkpoint is 1GB) to the same session cluster other jobs are impacted
because this job starts to utilise more memory and CPU and eventually
terminates the pod.

To mitigate this issue and provide better resource isolation we have
created multiple session clusters where we will launch a high
throughput (stateful) job in one cluster and club low throughput jobs in
another cluster.
This seems to work fine but managing this will be painful once we start to
create more session cluster for high throughput jobs (10 plus jobs) as we
will not have a single flink endpoint to submit the job ( as we have it in
YARN where we submit directly to RM )

Can you please provide me inputs on how we should handle this better in
Kubernetes



Regards,
Vinay Patil

Reply via email to