Spark client reconnect to driver in yarn-cluster deployment mode

2015-01-15 Thread preeze
>From the official spark documentation (http://spark.apache.org/docs/1.2.0/running-on-yarn.html): "In yarn-cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application." Is there any d

Apache Spark client high availability

2015-01-12 Thread preeze
Dear community, I've been searching the internet for quite a while to find out what is the best architecture to support HA for a spark client. We run an application that connects to a standalone Spark cluster and caches a big chuck of data for subsequent intensive calculations. To achieve HA we'l