Re: Can't submit job to stand alone cluster

Annabel Melongo Tue, 29 Dec 2015 13:09:29 -0800

Andrew,
Now I see where the confusion lays. Standalone cluster mode, your link, is 
nothing but a combination of client-mode and standalone mode, my link, without 
YARN.
But I'm confused by this paragraph in your link:
        If your application is launched through Spark submit, then the 
application jar is automatically distributed to all worker nodes. For any 
additional jars that your          application depends on, you should specify 
them through the --jars flag using comma as a delimiter (e.g. --jars jar1,jar2).
That can't be true; this is only the case when Spark runs on top of YARN. 
Please correct me, if I'm wrong.
Thanks

    On Tuesday, December 29, 2015 2:54 PM, Andrew Or <and...@databricks.com> 
wrote:

http://spark.apache.org/docs/latest/spark-standalone.html#launching-spark-applications

2015-12-29 11:48 GMT-08:00 Annabel Melongo <melongo_anna...@yahoo.com>:

Greg,
Can you please send me a doc describing the standalone cluster mode? Honestly, 
I never heard about it.
The three different modes, I've listed appear in the last paragraph of this 
doc: Running Spark Applications
|   |
|   |   |   |   |   |
| Running Spark Applications--class The FQCN of the class containing the main 
method of the application. For example, org.apache.spark.examples.SparkPi. 
--conf  |
|  |
| View on www.cloudera.com | Preview by Yahoo |
|  |
|   |

    On Tuesday, December 29, 2015 2:42 PM, Andrew Or <and...@databricks.com> 
wrote:

The confusion here is the expression "standalone cluster mode". Either it's 
stand-alone or it's cluster mode but it can't be both.

@Annabel That's not true. There is a standalone cluster mode where driver runs 
on one of the workers instead of on the client machine. What you're describing 
is standalone client mode.
2015-12-29 11:32 GMT-08:00 Annabel Melongo <melongo_anna...@yahoo.com>:

Greg,
The confusion here is the expression "standalone cluster mode". Either it's 
stand-alone or it's cluster mode but it can't be both.
 With this in mind, here's how jars are uploaded:    1. Spark Stand-alone mode: 
client and driver run on the same machine; use --packages option to submit a 
jar    2. Yarn Cluster-mode: client and driver run on separate machines; 
additionally driver runs as a thread in ApplicationMaster; use --jars option 
with a globally visible path to said jar    3. Yarn Client-mode: client and 
driver run on the same machine. driver is NOT a thread in ApplicationMaster; 
use --packages to submit a jar 

    On Tuesday, December 29, 2015 1:54 PM, Andrew Or <and...@databricks.com> 
wrote:

 Hi Greg,
It's actually intentional for standalone cluster mode to not upload jars. One 
of the reasons why YARN takes at least 10 seconds before running any simple 
application is because there's a lot of random overhead (e.g. putting jars in 
HDFS). If this missing functionality is not documented somewhere then we should 
add that.

Also, the packages problem seems legitimate. Thanks for reporting it. I have 
filed https://issues.apache.org/jira/browse/SPARK-12559.
-Andrew
2015-12-29 4:18 GMT-08:00 Greg Hill <greg.h...@rackspace.com>:

On 12/28/15, 5:16 PM, "Daniel Valdivia" <h...@danielvaldivia.com> wrote:

>Hi,
>
>I'm trying to submit a job to a small spark cluster running in stand
>alone mode, however it seems like the jar file I'm submitting to the
>cluster is "not found" by the workers nodes.
>
>I might have understood wrong, but I though the Driver node would send
>this jar file to the worker nodes, or should I manually send this file to
>each worker node before I submit the job?

Yes, you have misunderstood, but so did I.  So the problem is that
--deploy-mode cluster runs the Driver on the cluster as well, and you
don't know which node it's going to run on, so every node needs access to
the JAR.  spark-submit does not pass the JAR along to the Driver, but the
Driver will pass it to the executors.  I ended up putting the JAR in HDFS
and passing an hdfs:// path to spark-submit.  This is a subtle difference
from Spark on YARN which does pass the JAR along to the Driver
automatically, and IMO should probably be fixed in spark-submit.  It's
really confusing for newcomers.

Another problem I ran into that you also might is that --packages doesn't
work with --deploy-mode cluster.  It downloads the packages to a temporary
location on the node running spark-submit, then passes those paths to the
node that is running the Driver, but since that isn't the same machine, it
can't find anything and fails.  The driver process *should* be the one
doing the downloading, but it isn't. I ended up having to create a fat JAR
with all of the dependencies to get around that one.

Greg

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Can't submit job to stand alone cluster

Reply via email to