Re: Can't submit job to stand alone cluster

Andrew Or Tue, 29 Dec 2015 11:55:10 -0800

http://spark.apache.org/docs/latest/spark-standalone.html#launching-spark-applications


2015-12-29 11:48 GMT-08:00 Annabel Melongo <melongo_anna...@yahoo.com>:

> Greg,
>
> Can you please send me a doc describing the standalone cluster mode?
> Honestly, I never heard about it.
>
> The three different modes, I've listed appear in the last paragraph of
> this doc: Running Spark Applications
> <http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cdh_ig_running_spark_apps.html>
>
>
>
>
>
>
> Running Spark Applications
> <http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cdh_ig_running_spark_apps.html>
> --class The FQCN of the class containing the main method of the
> application. For example, org.apache.spark.examples.SparkPi. --conf
> View on www.cloudera.com
> <http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cdh_ig_running_spark_apps.html>
> Preview by Yahoo
>
>
>
>
> On Tuesday, December 29, 2015 2:42 PM, Andrew Or <and...@databricks.com>
> wrote:
>
>
> The confusion here is the expression "standalone cluster mode". Either
> it's stand-alone or it's cluster mode but it can't be both.
>
>
> @Annabel That's not true. There *is* a standalone cluster mode where
> driver runs on one of the workers instead of on the client machine. What
> you're describing is standalone client mode.
>
> 2015-12-29 11:32 GMT-08:00 Annabel Melongo <melongo_anna...@yahoo.com>:
>
> Greg,
>
> The confusion here is the expression "standalone cluster mode". Either
> it's stand-alone or it's cluster mode but it can't be both.
>
>  With this in mind, here's how jars are uploaded:
>     1. Spark Stand-alone mode: client and driver run on the same machine;
> use --packages option to submit a jar
>     2. Yarn Cluster-mode: client and driver run on separate machines;
> additionally driver runs as a thread in ApplicationMaster; use --jars
> option with a globally visible path to said jar
>     3. Yarn Client-mode: client and driver run on the same machine. driver
> is *NOT* a thread in ApplicationMaster; use --packages to submit a jar
>
>
> On Tuesday, December 29, 2015 1:54 PM, Andrew Or <and...@databricks.com>
> wrote:
>
>
> Hi Greg,
>
> It's actually intentional for standalone cluster mode to not upload jars.
> One of the reasons why YARN takes at least 10 seconds before running any
> simple application is because there's a lot of random overhead (e.g.
> putting jars in HDFS). If this missing functionality is not documented
> somewhere then we should add that.
>
> Also, the packages problem seems legitimate. Thanks for reporting it. I
> have filed https://issues.apache.org/jira/browse/SPARK-12559.
>
> -Andrew
>
> 2015-12-29 4:18 GMT-08:00 Greg Hill <greg.h...@rackspace.com>:
>
>
>
> On 12/28/15, 5:16 PM, "Daniel Valdivia" <h...@danielvaldivia.com> wrote:
>
> >Hi,
> >
> >I'm trying to submit a job to a small spark cluster running in stand
> >alone mode, however it seems like the jar file I'm submitting to the
> >cluster is "not found" by the workers nodes.
> >
> >I might have understood wrong, but I though the Driver node would send
> >this jar file to the worker nodes, or should I manually send this file to
> >each worker node before I submit the job?
>
> Yes, you have misunderstood, but so did I.  So the problem is that
> --deploy-mode cluster runs the Driver on the cluster as well, and you
> don't know which node it's going to run on, so every node needs access to
> the JAR.  spark-submit does not pass the JAR along to the Driver, but the
> Driver will pass it to the executors.  I ended up putting the JAR in HDFS
> and passing an hdfs:// path to spark-submit.  This is a subtle difference
> from Spark on YARN which does pass the JAR along to the Driver
> automatically, and IMO should probably be fixed in spark-submit.  It's
> really confusing for newcomers.
>
> Another problem I ran into that you also might is that --packages doesn't
> work with --deploy-mode cluster.  It downloads the packages to a temporary
> location on the node running spark-submit, then passes those paths to the
> node that is running the Driver, but since that isn't the same machine, it
> can't find anything and fails.  The driver process *should* be the one
> doing the downloading, but it isn't. I ended up having to create a fat JAR
> with all of the dependencies to get around that one.
>
> Greg
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
>
>
>
>
>
>

Re: Can't submit job to stand alone cluster

Reply via email to