I guess this is known issue and being tracked by SPARK-15544
[1] SPARK-23530 [2] (seems duplicated)

I guess that's the simplest implementation of H/A (since we don't bother
with current state in master) when background process like supervisord
restarts the process when process is no longer running, but if there's no
background process being setup, it may lead to become all master processes
being shut down eventually.

IMHO the safer approach is storing all information to ZK (source of truth)
and only leader master can read and write on that. Other follower masters
just wait and load information when one of them becomes master. That should
require pretty much changes though.

Hope this helps.

Thanks,
Jungtaek Lim (HeartSaVioR)

1. https://issues.apache.org/jira/browse/SPARK-15544
2. https://issues.apache.org/jira/browse/SPARK-23530

2019년 3월 5일 (화) 오후 10:02, lokeshkumar <lok...@dataken.net>님이 작성:

> As I understand, Apache Spark Master can be run in high availability mode
> using Zookeeper. That is, multiple Spark masters can run in Leader/Follower
> mode and these modes are registered with Zookeeper.
>
> In our scenario Zookeeper is expiring the Spark Master's session which is
> acting as Leader. So the Spark MAster which is leader receives this
> notification and shutsdown deliberately.
>
> Can someone explain why this decision os shutting down rather than retrying
> has been taken?
>
> And why does Kafka retry connecting to Zookeeper when it receives the same
> Expiry notification?
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Reply via email to