http://spark.apache.org/docs/latest/spark-standalone.html#launching-spark-applications
2015-12-29 11:48 GMT-08:00 Annabel Melongo <melongo_anna...@yahoo.com>: > Greg, > > Can you please send me a doc describing the standalone cluster mode? > Honestly, I never heard about it. > > The three different modes, I've listed appear in the last paragraph of > this doc: Running Spark Applications > <http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cdh_ig_running_spark_apps.html> > > > > > > > Running Spark Applications > <http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cdh_ig_running_spark_apps.html> > --class The FQCN of the class containing the main method of the > application. For example, org.apache.spark.examples.SparkPi. --conf > View on www.cloudera.com > <http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cdh_ig_running_spark_apps.html> > Preview by Yahoo > > > > > On Tuesday, December 29, 2015 2:42 PM, Andrew Or <and...@databricks.com> > wrote: > > > The confusion here is the expression "standalone cluster mode". Either > it's stand-alone or it's cluster mode but it can't be both. > > > @Annabel That's not true. There *is* a standalone cluster mode where > driver runs on one of the workers instead of on the client machine. What > you're describing is standalone client mode. > > 2015-12-29 11:32 GMT-08:00 Annabel Melongo <melongo_anna...@yahoo.com>: > > Greg, > > The confusion here is the expression "standalone cluster mode". Either > it's stand-alone or it's cluster mode but it can't be both. > > With this in mind, here's how jars are uploaded: > 1. Spark Stand-alone mode: client and driver run on the same machine; > use --packages option to submit a jar > 2. Yarn Cluster-mode: client and driver run on separate machines; > additionally driver runs as a thread in ApplicationMaster; use --jars > option with a globally visible path to said jar > 3. Yarn Client-mode: client and driver run on the same machine. driver > is *NOT* a thread in ApplicationMaster; use --packages to submit a jar > > > On Tuesday, December 29, 2015 1:54 PM, Andrew Or <and...@databricks.com> > wrote: > > > Hi Greg, > > It's actually intentional for standalone cluster mode to not upload jars. > One of the reasons why YARN takes at least 10 seconds before running any > simple application is because there's a lot of random overhead (e.g. > putting jars in HDFS). If this missing functionality is not documented > somewhere then we should add that. > > Also, the packages problem seems legitimate. Thanks for reporting it. I > have filed https://issues.apache.org/jira/browse/SPARK-12559. > > -Andrew > > 2015-12-29 4:18 GMT-08:00 Greg Hill <greg.h...@rackspace.com>: > > > > On 12/28/15, 5:16 PM, "Daniel Valdivia" <h...@danielvaldivia.com> wrote: > > >Hi, > > > >I'm trying to submit a job to a small spark cluster running in stand > >alone mode, however it seems like the jar file I'm submitting to the > >cluster is "not found" by the workers nodes. > > > >I might have understood wrong, but I though the Driver node would send > >this jar file to the worker nodes, or should I manually send this file to > >each worker node before I submit the job? > > Yes, you have misunderstood, but so did I. So the problem is that > --deploy-mode cluster runs the Driver on the cluster as well, and you > don't know which node it's going to run on, so every node needs access to > the JAR. spark-submit does not pass the JAR along to the Driver, but the > Driver will pass it to the executors. I ended up putting the JAR in HDFS > and passing an hdfs:// path to spark-submit. This is a subtle difference > from Spark on YARN which does pass the JAR along to the Driver > automatically, and IMO should probably be fixed in spark-submit. It's > really confusing for newcomers. > > Another problem I ran into that you also might is that --packages doesn't > work with --deploy-mode cluster. It downloads the packages to a temporary > location on the node running spark-submit, then passes those paths to the > node that is running the Driver, but since that isn't the same machine, it > can't find anything and fails. The driver process *should* be the one > doing the downloading, but it isn't. I ended up having to create a fat JAR > with all of the dependencies to get around that one. > > Greg > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > > > > > > >