Hi, thanks for your reply, it was very helpful. we tried to go with the 2nd approach, enabling HA mode, and added these conf values: high-availability: zookeeper high-availability.zookeeper.quorum: zk-noa-edge-infra:2181 high-availability.zookeeper.path.root: /flink high-availability.cluster-id: /flink high-availability.storageDir: /flink_state high-availability.jobmanager.port: 6150
for the storageDir, we are using a k8s persistent volume with ReadWriteOnce Recovery of job-manager failure is working now, but it looks like there are issues with the task-managers: The same configuration file is used in the task-managers as well, and there are a lot of error in the task-manager’s logs – java.io.FileNotFoundException: /flink_state/flink/blob/job_9f4be579c7ab79817e25ed56762b7623/blob_p-5cf39313e388d9120c235528672fd267105be0e0-938e4347a98aa6166dc2625926fdab56 (No such file or directory) It seems that the taskmanagers are trying to access the jobmanager’s storage dir – can this be avoided? The task manager does not have access to the job manager persistent volume – is this mandatory? If we don’t have the option to use shared storage, is there a way to make zookeeper hold and manage the job states, instead of using the shared storage? Thank Noa From: bastien dine <bastien.d...@gmail.com> Date: Friday, 4 February 2022 at 10:56 To: Koffman, Noa (Nokia - IL/Kfar Sava) <noa.koff...@nokia.com> Cc: user@flink.apache.org <user@flink.apache.org> Subject: Re: Flink High-Availability and Job-Manager recovery Hello, On k8s the current recommendation is to set up 1 job manager with H-A enabled, so that cluster do not lost state upon crash 1. The storage dir can for sure be on kube PV, the directory should be shared within all JM, you will need to map the volume to the same local directory (e.g /data) so that the configuration amongst JM is the same 2. You can have only 1 JM, but you still need to enabled HA, since HA will write the cluster state into ZK & storage dir 3. I don't know anything about beam, so I can not help you with that, But per-job mode will not be available on k8s (neither native nor standalone kube) https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/deployment/resource-providers/standalone/kubernetes/#per-job-mode & https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/deployment/resource-providers/native_kubernetes/#per-job-mode, you will need YARN to do so (i think MESOS is deprecated) Application mode can be a bit tricky to understand, it will "move" the submit of the job inside the JM The chosen solution will depends on your deployment needs, I can't tell you without knowing more, But going into session mode + streaming job deployment is pretty standard and you can easily emulate "one cluster per job" with it (better for ops & tuning matters than a cluster with multiple jobs) Hope this can help, Regards, Bastien Le jeu. 3 févr. 2022 à 15:12, Koffman, Noa (Nokia - IL/Kfar Sava) <noa.koff...@nokia.com<mailto:noa.koff...@nokia.com>> a écrit : Hi all, We are currently deploying flink on k8s 3 nodes cluster - with 1 job-manager and 3 task managers We are trying to understand the recommendation for deployment, more specifically for recovery from job-manager failure, and have some questions about that: 1. If we use flink HA solution (either Kubernetes-HA or zookeeper), the documentation states we should define the ‘high-availability.storageDir In the examples we found, there is mostly hdfs or s3 storage. We were wondering if we could use Kubernetes PersistentVolumes and PersistentVolumeClaims, if we do use that, can each job-manager have its own volume? Or it must be shared? 1. Is there a solution for jobmanager recovery without HA? With the way our flink is currenly configured, killing the job-manager pod, all the jobs are lost. Is there a way to configure the job-manager so that if it goes down and k8s restarts it, it will continue from the same state (restart all the tasks, etc…)? For this, can a Persistent Volume be used, without HDFS or external solutions? 1. Regarding the deployment mode: we are working with beam + flink, and flink is running in session mode, we have a few long running streaming pipelines deployed (less then 10). Is ‘session’ mode the right deployment mode for our type of deployment? Or should we consider switching to something different? (Per-job/application) Thanks