Hi,

I'm trying to enable HA for my Flink jobs running on AWS EMR.
Following [1], I created a common Flink YARN session and submitting all my
jobs to that one. These 4 config params were added
/    high-availability = zookeeper
    high-availability.storageDir =  
    high-availability.zookepper.path.root = /flink
    high-availability.zookeeper.quorum = <EMR's master node's DNS name>:2181
/(The Zookeeper came with EMR was used)

The command to start that Flink YARN session is like this:
`/flink-yarn-session -Dtaskmanager.memory.process.size=4g -nm
FlinkCommonSession -z FlinkCommonSession -d/`

The first HA test - yarn application killed - went well. I killed that
common session by using `/yarn application --kill <appId>/` and created a
new session using the same command, then the jobs were restored
automatically after that session was up.

However, the 2nd HA test - EMR cluster crashed - didn't work: the */jobs are
not restored/ *after the common session was created on the new EMR cluster.
(attached  jobmanager.gz
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1586/jobmanager.gz>
 
)

Should I expect that the jobs are restored in that scenario no.2 - EMR
cluster crashed.
Do I miss something here?

Thanks for your help.

Regards,
Averell

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/yarn_setup.html




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reply via email to