RE: Submitting Python Applications from Remote to Master

Ashic Mahtab Sat, 15 Nov 2014 08:38:41 -0800
Hi Ognen,Currently, 
"Note that cluster mode is currently not supported for standalone clusters, 
Mesos clusters, or python applications."
So it seems like Yarn + scala is the only option for fire and forget. It 
shouldn't be too hard to create a "proxy" submitter, but yes, that does involve 
another process (potentially server) on that side. I've heard good things about 
Ooyala's server, but haven't got around to trying to set it up. As such, can't 
really comment.
Regards,Ashic. 
> Date: Sat, 15 Nov 2014 09:50:14 -0600
> From: ognen.duzlev...@gmail.com
> To: as...@live.com
> CC: quasi...@gmail.com; user@spark.apache.org
> Subject: Re: Submitting Python Applications from Remote to Master
> 
> Ashic,
> 
> Thanks for your email.
> 
> Two things:
> 
> 1. I think a whole lot of data scientists and other people would love
> it if they could just fire off jobs from their laptops. It is, in my
> opinion, a common desired use case.
> 
> 2. Did anyone actually get the Ooyala job server to work? I asked that
> question 6 months ago and never got a straight answer. I ended up
> writing a middle-layer using Scalatra and actors to submit jobs via an
> API and receive results back in JSON. In that I ran into the inability
> to share the SparkContext "feature" and it took a lot of finagling to
> make things work (but it never felt "production ready").
> 
> Ognen
> 
> On Sat, Nov 15, 2014 at 03:36:43PM +0000, Ashic Mahtab wrote:
> > Hi Ben,I haven't tried it with Python, but the instructions are the same as 
> > for Scala compiled (jar) apps. What it's saying is that it's not possible 
> > to offload the entire work to the master (ala hadoop) in a fire and forget 
> > (or rather submit-and-forget) manner when running on stand alone. There are 
> > two deployment modes - client and cluster. For standalone, only client is 
> > supported. What this means is that the "submitting process" will be the 
> > driver process (not to be confused with "master"). It should very well be 
> > possible to submit from you laptop to a standalone cluster, but the process 
> > running spark-submit will be alive until the job finishes. If you terminate 
> > the process (via kill-9 or otherwise), then the job will be terminated as 
> > well. The driver process will submit the work to the spark master, which 
> > will do the usually divvying up of tasks, distribution, fault tolerance, 
> > etc. and the results will get reported back to the driver process. 
> > Often it's not possible to have arbitrary access to the spark master, and 
> > if jobs take hours to complete, it's not feasible to have the process 
> > running on the laptop without interruptions, disconnects, etc. As such, a 
> > "gateway" machine is used closer to the spark master that's used to submit 
> > jobs from. That way, the process on the gateway machine lives for the 
> > duration of the job, and no connection from the laptop, etc. is needed. 
> > It's not uncommon to actually have an api to the gateway machine. For 
> > example, Ooyala's job server https://github.com/ooyala/spark-jobserver 
> > provides a restful interface to submit jobs.
> > Does that help?
> > Regards,Ashic.
> > Date: Fri, 14 Nov 2014 13:40:43 -0600
> > Subject: Submitting Python Applications from Remote to Master
> > From: quasi...@gmail.com
> > To: user@spark.apache.org
> > 
> > Hi All,
> > I'm not quite clear on whether submitting a python application to spark 
> > standalone on ec2 is possible. 
> > Am I reading this correctly:
> > *A common deployment strategy is to submit your application from a gateway 
> > machine that is physically co-located with your worker machines (e.g. 
> > Master node in a standalone EC2 cluster). In this setup, client mode is 
> > appropriate. In client mode, the driver is launched directly within the 
> > client spark-submit process, with the input and output of the application 
> > attached to the console. Thus, this mode is especially suitable for 
> > applications that involve the REPL (e.g. Spark shell).Alternatively, if 
> > your application is submitted from a machine far from the worker machines 
> > (e.g. locally on your laptop), it is common to usecluster mode to minimize 
> > network latency between the drivers and the executors. Note that cluster 
> > mode is currently not supported for standalone clusters, Mesos clusters, or 
> > python applications.
> > So I shouldn't be able to do something like:./bin/spark-submit  --master 
> > spark:/xxxxx.compute-1.amazonaws.com:7077  examples/src/main/python/pi.py 
> > From a laptop connecting to a previously launched spark cluster using the 
> > default spark-ec2 script, correct?
> > If I am not mistaken about this then docs are slightly confusing -- the 
> > above example is more or less the example here: 
> > https://spark.apache.org/docs/1.1.0/submitting-applications.html
> > If I am mistaken, apologies, can you help me figure out where I went 
> > wrong?I've also taken to opening port 7077 to 0.0.0.0/0
> > --Ben
> > 
> > 
> >                                       
> 
> -- 
> "Convictions are more dangerous enemies of truth than lies." - Friedrich 
> Nietzsche
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
RE: Submitting Python Applications from Remote to Master

Reply via email to