Re: different in spark on yarn mode and standalone mode

2014-05-16 Thread Sandy Ryza
We made several stabilization changes to Spark on YARN that made it into Spark 0.9.1 and CDH5.0. 1.0 significantly simplifies submitting a Spark app to a YARN cluster (wildly different invocations are no longer needed for yarn-client and yarn-cluster mode). I'm not sure about who is running it in

Re: different in spark on yarn mode and standalone mode

2014-05-16 Thread Sandy Ryza
Hi Vipul, Some advantages of using YARN: * YARN allows you to dynamically share and centrally configure the same pool of cluster resources between all frameworks that run on YARN. You can throw your entire cluster at a MapReduce job, then use some of it on an Impala query and the rest on Spark ap

Re: different in spark on yarn mode and standalone mode

2014-05-16 Thread Vipul Pandey
Thanks for responding, Sandy. YARN for sure is a more mature way of working on shared resources. I was not sure about how stable Spark on YARN is and if anyone is using it in production. I have been using Standalone mode in our dev cluster but multi-tenancy and resource allocation wise it's di

Re: different in spark on yarn mode and standalone mode

2014-05-16 Thread Vipul Pandey
And I thought I sent it to the right list! Here you go again - Question below : On May 14, 2014, at 3:06 PM, Vipul Pandey wrote: > So here's a followup question : What's the preferred mode? > We have a new cluster coming up with petabytes of data and we intend to take > Spark to production. W

Re: different in spark on yarn mode and standalone mode

2014-05-15 Thread Vipul Pandey
So here's a followup question : What's the preferred mode? We have a new cluster coming up with petabytes of data and we intend to take Spark to production. We are trying to figure out what mode would be safe and stable for production like environment. pros and cons? anyone? Any reasons why o

RE: different in spark on yarn mode and standalone mode

2014-05-04 Thread Liu, Raymond
In the core, they are not quite different In standalone mode, you have spark master and spark worker who allocate driver and executors for your spark app. While in Yarn mode, Yarn resource manager and node manager do this work. When the driver and executors have been launched, the rest part of res