We made several stabilization changes to Spark on YARN that made it
into Spark 0.9.1 and CDH5.0. 1.0 significantly simplifies submitting a
Spark app to a YARN cluster (wildly different invocations are no longer
needed for yarn-client and yarn-cluster mode).
I'm not sure about who is running it in
Hi Vipul,
Some advantages of using YARN:
* YARN allows you to dynamically share and centrally configure the same
pool of cluster resources between all frameworks that run on YARN. You can
throw your entire cluster at a MapReduce job, then use some of it on an
Impala query and the rest on Spark ap
Thanks for responding, Sandy.
YARN for sure is a more mature way of working on shared resources. I was not
sure about how stable Spark on YARN is and if anyone is using it in production.
I have been using Standalone mode in our dev cluster but multi-tenancy and
resource allocation wise it's di
And I thought I sent it to the right list! Here you go again - Question below :
On May 14, 2014, at 3:06 PM, Vipul Pandey wrote:
> So here's a followup question : What's the preferred mode?
> We have a new cluster coming up with petabytes of data and we intend to take
> Spark to production. W
So here's a followup question : What's the preferred mode?
We have a new cluster coming up with petabytes of data and we intend to take
Spark to production. We are trying to figure out what mode would be safe and
stable for production like environment.
pros and cons? anyone?
Any reasons why o
In the core, they are not quite different
In standalone mode, you have spark master and spark worker who allocate driver
and executors for your spark app.
While in Yarn mode, Yarn resource manager and node manager do this work.
When the driver and executors have been launched, the rest part of res