Hi All, I'm not quite clear on whether submitting a python application to spark standalone on ec2 is possible.
Am I reading this correctly: *A common deployment strategy is to submit your application from a gateway machine that is physically co-located with your worker machines (e.g. Master node in a standalone EC2 cluster). In this setup, client mode is appropriate. In client mode, the driver is launched directly within the client spark-submit process, with the input and output of the application attached to the console. Thus, this mode is especially suitable for applications that involve the REPL (e.g. Spark shell). Alternatively, if your application is submitted from a machine far from the worker machines (e.g. locally on your laptop), it is common to usecluster mode to minimize network latency between the drivers and the executors. Note that cluster mode is currently not supported for standalone clusters, Mesos clusters, or python applications. So I shouldn't be able to do something like: ./bin/spark-submit --master spark:/xxxxx.compute-1.amazonaws.com:7077 examples/src/main/python/pi.py >From a laptop connecting to a previously launched spark cluster using the default spark-ec2 script, correct? If I am not mistaken about this then docs are slightly confusing -- the above example is more or less the example here: https://spark.apache.org/docs/1.1.0/submitting-applications.html If I am mistaken, apologies, can you help me figure out where I went wrong? I've also taken to opening port 7077 to 0.0.0.0/0 --Ben