Hi, I have a couple of use cases for Apache Spark applications/scripts, generally of the following form:
*General ETL use case* - more specifically a transformation of a Cassandra column family containing many events (think event sourcing) into various aggregated column families. *Streaming use case* - realtime analysis of the events as they arrive in the system. For *(1)*, I'll need to kick off the Spark application periodically. For *(2)*, just kick off the long running Spark Streaming process at boot time and let it go. /(Note - I'm using Spark Standalone as the cluster manager, so no yarn or mesos)/ I'm trying to figure out the most common / best practice deployment strategies for Spark applications. So far the options I can see are: 1. *Deploying my program as a jar, and running the various tasks with spark-submit* - which seems to be the way recommended in the spark docs. Some thoughts about this strategy: * how do you start/stop tasks - just using simple bash scripts? * how is scheduling managed? - simply use cron? * any resilience? (e.g. Who schedules the jobs to run if the driver server dies?) 2. *Creating a separate webapp as the driver program.* * creates a spark context programmatically to talk to the spark cluster * allowing users to kick off tasks through the http interface * using Quartz (for example) to manage scheduling could use cluster with zookeeper election for resilience 3. *Spark job server (https://github.com/ooyala/spark-jobserver)* * I don't think there's much benefit over (2) for me, as I don't (yet) have many teams and projects talking to Spark, and would still need some app to talk to job server anyway * no scheduling built in as far as I can see I'd like to understand the general consensus w.r.t a simple but robust deployment strategy I haven't been able to determine one by trawling the web, as of yet. Thanks very much! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Apache-Spark-application-deployment-best-practices-tp23036.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org