Hi All,

I'm not quite clear on whether submitting a python application to spark
standalone on ec2 is possible.

Am I reading this correctly:

*A common deployment strategy is to submit your application from a gateway
machine that is physically co-located with your worker machines (e.g.
Master node in a standalone EC2 cluster). In this setup, client mode is
appropriate. In client mode, the driver is launched directly within the
client spark-submit process, with the input and output of the application
attached to the console. Thus, this mode is especially suitable for
applications that involve the REPL (e.g. Spark shell).

Alternatively, if your application is submitted from a machine far from the
worker machines (e.g. locally on your laptop), it is common to usecluster mode
to minimize network latency between the drivers and the executors. Note
that cluster mode is currently not supported for standalone clusters, Mesos
clusters, or python applications.


So I shouldn't be able to do something like:

./bin/spark-submit  --master spark:/xxxxx.compute-1.amazonaws.com:7077
 examples/src/main/python/pi.py


>From a laptop connecting to a previously launched spark cluster using the
default spark-ec2 script, correct?


If I am not mistaken about this then docs are slightly confusing -- the
above example is more or less the example here:
https://spark.apache.org/docs/1.1.0/submitting-applications.html


If I am mistaken, apologies, can you help me figure out where I went wrong?

I've also taken to opening port 7077 to 0.0.0.0/0

--Ben

Reply via email to