Hello Roman,

Thanks for your time.
I'm using EMR 5.30.1 (Flink 1.10.0) with 1 master node.
/yarn.application-attempts/ is not set (does that means unlimited?), while 
/yarn.resourcemanager.am.max-attempts/ is 4.

In saying "EMR cluster crashed) I meant the cluster is lost. Some scenarios
which could lead to this are:
  - The master node is down
  - The cluster is accidentally / deliberately terminated.

I found a thread in our mailing list [1], in which Fabian mentioned a
/"pointer"/ stored in Zookeeper. It looks like this piece of information is
stored in Zookeeper's dataDir, which is by default stored in the local
storage of the EMR's master node. I'm trying to move this one to an EFS, in
hope that it would help. Not sure whether this is a right approach.

Thanks for your help.
Regards,
Averell


[1]
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/HA-and-zookeeper-tp27093p27119.html



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reply via email to