Hi, I am planning to go production using spark standalone mode using the following configuration and I would like to know if I am missing something or any other suggestions are welcome.
1) Three Spark Standalone Master deployed on different nodes and using Apache Zookeeper for Leader Election. 2) Two or Three worker nodes (For our workloads which is being able to process 5000 messages/sec two worker nodes are more than sufficient when we ran our tests but for the safe side we may use three ) 3) will use HDFS for storing recoverable state, WAL, checkpoint etc since we are running a streaming application. 4) some sort of monitoring and alerting framework Do I need anything else apart from this? What's not clear to me is how service discovery is done. For examples, Right now we manually have to edit the ip addresses of worker machines in SPARK_HOME/conf/slaves so we have to bring the entire cluster down. so what is most common way to solve this given that we dont plan on using mesos or yarn? I know of some tools which can help me here but I would like to know which of those tools are widely used? Any other suggestions in case I am missing are welcome. Thanks, kant