Re: Spark on yarn vs spark standalone

Jacek Laskowski Mon, 30 Nov 2015 10:09:07 -0800

Hi Mark,

I said I've only managed to develop a limited understanding of how
Spark works in the different deploy modes ;-)


But somehow I thought that cluster in spark standalone is not
supported. I think I've seen a JIRA with a change quite recently where
it was said or something similar. Can't find it now :(

Pozdrawiam,
Jacek

--
Jacek Laskowski | https://medium.com/@jaceklaskowski/ |
http://blog.jaceklaskowski.pl
Mastering Spark https://jaceklaskowski.gitbooks.io/mastering-apache-spark/
Follow me at https://twitter.com/jaceklaskowski
Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski


On Mon, Nov 30, 2015 at 6:58 PM, Mark Hamstra <m...@clearstorydata.com> wrote:
> Standalone mode also supports running the driver on a cluster node.  See
> "cluster" mode in
> http://spark.apache.org/docs/latest/spark-standalone.html#launching-spark-applications
> .  Also,
> http://spark.apache.org/docs/latest/spark-standalone.html#high-availability
>
> On Mon, Nov 30, 2015 at 9:47 AM, Jacek Laskowski <ja...@japila.pl> wrote:
>>
>> Hi,
>>
>> My understanding of Spark on YARN and even Spark in general is very
>> limited so keep that in mind.
>>
>> I'm not sure why you compare yarn-cluster and spark standalone? In
>> yarn-cluster a driver runs on a node in the YARN cluster while spark
>> standalone keeps the driver on the machine you launched a Spark
>> application. Also, YARN cluster supports retrying applications while
>> standalone doesn't. There's also support for rack locality preference
>> (but dunno if that's used and where in Spark).
>>
>> My limited understanding suggests me to use Spark on YARN if you're
>> considering to use Hadoop/HDFS and submitting jobs using YARN.
>> Standalone's an entry option where throwing in YARN could kill
>> introducing Spark to organizations without Hadoop YARN.
>>
>> Just my two cents.
>>
>> Pozdrawiam,
>> Jacek
>>
>> --
>> Jacek Laskowski | https://medium.com/@jaceklaskowski/ |
>> http://blog.jaceklaskowski.pl
>> Mastering Spark https://jaceklaskowski.gitbooks.io/mastering-apache-spark/
>> Follow me at https://twitter.com/jaceklaskowski
>> Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski
>>
>>
>> On Fri, Nov 27, 2015 at 8:36 AM, cs user <acldstk...@gmail.com> wrote:
>> > Hi All,
>> >
>> > Apologies if this question has been asked before. I'd like to know if
>> > there
>> > are any downsides to running spark over yarn with the --master
>> > yarn-cluster
>> > option vs having a separate spark standalone cluster to execute jobs?
>> >
>> > We're looking at installing a hdfs/hadoop cluster with Ambari and
>> > submitting
>> > jobs to the cluster using yarn, or having an Ambari cluster and a
>> > separate
>> > standalone spark cluster, which will run the spark jobs on data within
>> > hdfs.
>> >
>> > With yarn, will we still get all the benefits of spark?
>> >
>> > Will it be possible to process streaming data?
>> >
>> > Many thanks in advance for any responses.
>> >
>> > Cheers!
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark on yarn vs spark standalone

Reply via email to