When your APIServer or ETCD of your K8s cluster is working in heavy load,
then the fabric8 kubernetes client
might get a timeout when watching/renewing/getting the ConfigMap.
I think you could increase the read/connect timeout(default is 10s) of http
client and have a try.
env.java.opts:
Hi Enrique,
thanks for reaching out to the community. I'm not 100% sure what problem
you're facing. The log messages you're sharing could mean that the Flink
cluster still behaves as normal having some outages and the HA
functionality kicking in.
The behavior you're seeing with leaders for the
To add to my post, instead of using POD IP for the `jobmanager.rpc.address`
configuration we start each JM pod with the Fully Qualified Name `--host
..ns.svc:8081` and this address gets persisted
to the ConfigMaps. In some scenarios, the leader address in the ConfigMaps
might differ.
For
Hi All,
Flink 1.13.0
I have a Session cluster deployed with StatefulSet + PVs with HA configured
within a Kubernetes cluster. I have submitted jobs to it, and it all works
fine. Most of my jobs are long-running, typically consuming data from Kafka.
I have noticed that after some time all my