Re: Can't submit job to stand alone cluster

Andrew Or Tue, 29 Dec 2015 10:55:19 -0800

Hi Greg,

It's actually intentional for standalone cluster mode to not upload jars.
One of the reasons why YARN takes at least 10 seconds before running any
simple application is because there's a lot of random overhead (e.g.
putting jars in HDFS). If this missing functionality is not documented
somewhere then we should add that.


Also, the packages problem seems legitimate. Thanks for reporting it. I
have filed https://issues.apache.org/jira/browse/SPARK-12559.

-Andrew

2015-12-29 4:18 GMT-08:00 Greg Hill <greg.h...@rackspace.com>:

>
>
> On 12/28/15, 5:16 PM, "Daniel Valdivia" <h...@danielvaldivia.com> wrote:
>
> >Hi,
> >
> >I'm trying to submit a job to a small spark cluster running in stand
> >alone mode, however it seems like the jar file I'm submitting to the
> >cluster is "not found" by the workers nodes.
> >
> >I might have understood wrong, but I though the Driver node would send
> >this jar file to the worker nodes, or should I manually send this file to
> >each worker node before I submit the job?
>
> Yes, you have misunderstood, but so did I.  So the problem is that
> --deploy-mode cluster runs the Driver on the cluster as well, and you
> don't know which node it's going to run on, so every node needs access to
> the JAR.  spark-submit does not pass the JAR along to the Driver, but the
> Driver will pass it to the executors.  I ended up putting the JAR in HDFS
> and passing an hdfs:// path to spark-submit.  This is a subtle difference
> from Spark on YARN which does pass the JAR along to the Driver
> automatically, and IMO should probably be fixed in spark-submit.  It's
> really confusing for newcomers.
>
> Another problem I ran into that you also might is that --packages doesn't
> work with --deploy-mode cluster.  It downloads the packages to a temporary
> location on the node running spark-submit, then passes those paths to the
> node that is running the Driver, but since that isn't the same machine, it
> can't find anything and fails.  The driver process *should* be the one
> doing the downloading, but it isn't. I ended up having to create a fat JAR
> with all of the dependencies to get around that one.
>
> Greg
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: Can't submit job to stand alone cluster

Reply via email to