Hi Mark, I said I've only managed to develop a limited understanding of how Spark works in the different deploy modes ;-)
But somehow I thought that cluster in spark standalone is not supported. I think I've seen a JIRA with a change quite recently where it was said or something similar. Can't find it now :( Pozdrawiam, Jacek -- Jacek Laskowski | https://medium.com/@jaceklaskowski/ | http://blog.jaceklaskowski.pl Mastering Spark https://jaceklaskowski.gitbooks.io/mastering-apache-spark/ Follow me at https://twitter.com/jaceklaskowski Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski On Mon, Nov 30, 2015 at 6:58 PM, Mark Hamstra <m...@clearstorydata.com> wrote: > Standalone mode also supports running the driver on a cluster node. See > "cluster" mode in > http://spark.apache.org/docs/latest/spark-standalone.html#launching-spark-applications > . Also, > http://spark.apache.org/docs/latest/spark-standalone.html#high-availability > > On Mon, Nov 30, 2015 at 9:47 AM, Jacek Laskowski <ja...@japila.pl> wrote: >> >> Hi, >> >> My understanding of Spark on YARN and even Spark in general is very >> limited so keep that in mind. >> >> I'm not sure why you compare yarn-cluster and spark standalone? In >> yarn-cluster a driver runs on a node in the YARN cluster while spark >> standalone keeps the driver on the machine you launched a Spark >> application. Also, YARN cluster supports retrying applications while >> standalone doesn't. There's also support for rack locality preference >> (but dunno if that's used and where in Spark). >> >> My limited understanding suggests me to use Spark on YARN if you're >> considering to use Hadoop/HDFS and submitting jobs using YARN. >> Standalone's an entry option where throwing in YARN could kill >> introducing Spark to organizations without Hadoop YARN. >> >> Just my two cents. >> >> Pozdrawiam, >> Jacek >> >> -- >> Jacek Laskowski | https://medium.com/@jaceklaskowski/ | >> http://blog.jaceklaskowski.pl >> Mastering Spark https://jaceklaskowski.gitbooks.io/mastering-apache-spark/ >> Follow me at https://twitter.com/jaceklaskowski >> Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski >> >> >> On Fri, Nov 27, 2015 at 8:36 AM, cs user <acldstk...@gmail.com> wrote: >> > Hi All, >> > >> > Apologies if this question has been asked before. I'd like to know if >> > there >> > are any downsides to running spark over yarn with the --master >> > yarn-cluster >> > option vs having a separate spark standalone cluster to execute jobs? >> > >> > We're looking at installing a hdfs/hadoop cluster with Ambari and >> > submitting >> > jobs to the cluster using yarn, or having an Ambari cluster and a >> > separate >> > standalone spark cluster, which will run the spark jobs on data within >> > hdfs. >> > >> > With yarn, will we still get all the benefits of spark? >> > >> > Will it be possible to process streaming data? >> > >> > Many thanks in advance for any responses. >> > >> > Cheers! >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org