Hi Randy and Gino, The issue is that standalone-cluster mode is not officially supported. Please use standalone-client mode instead, i.e. specify --deploy-mode client in spark-submit, or simply leave out this config because it defaults to client mode.
Unfortunately, this is not currently documented anywhere, and the existing explanation for the distinction between cluster and client modes is highly misleading. In general, cluster mode means the driver runs on one of the worker nodes, just like the executors. The corollary is that the output of the application is not forwarded to command that launched the application (spark-submit in this case), but is accessible instead through the worker logs. In contrast, client mode means the command that launches the application also launches the driver, while the executors still run on the worker nodes. This means the spark-submit command also returns the output of the application. For instance, it doesn't make sense to run the Spark shell in cluster mode, because the stdin / stdout / stderr will not be redirected to the spark-submit command. If you are hosting your own cluster and can launch applications from within the cluster, then there is little benefit for launching your application in cluster mode, which is primarily intended to cut down the latency between the driver and the executors in the first place. However, if you are still intent on using standalone-cluster mode after all, you can use the deprecated way of launching org.apache.spark.deploy.Client directly through bin/spark-class. Note that this is not recommended and only serves as a temporary workaround until we fix standalone-cluster mode through spark-submit. I have filed the relevant issues: https://issues.apache.org/jira/browse/SPARK-2259 and https://issues.apache.org/jira/browse/SPARK-2260. Thanks for pointing this out, and we will get to fixing these shortly. Best, Andrew 2014-06-20 6:06 GMT-07:00 Gino Bustelo <lbust...@gmail.com>: > I've found that the jar will be copied to the worker from hdfs fine, but > it is not added to the spark context for you. You have to know that the jar > will end up in the driver's working dir, and so you just add a the file > name if the jar to the context in your program. > > In your example below, just add to the context "test.jar". > > Btw, the context will not have the master URL either, so add that while > you are at it. > > This is a big issue. I've posted about it a week ago and no replies. > Hopefully it gets more attention as more people start hitting this. > Basically, spark-submit on standalone cluster with cluster deploy is broken. > > Gino B. > > > On Jun 20, 2014, at 2:46 AM, randylu <randyl...@gmail.com> wrote: > > > > in addition, jar file can be copied to driver node automatically. > > > > > > > > -- > > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/problem-about-cluster-mode-of-spark-1-0-0-tp7982p7984.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. >