Spark-on-YARN takes 10-30 seconds of setup time for workloads like
WordCount and PageRank on a small-sized cluster and thereafter performs as
well as Spark standalone, as has been noted by Tom and Patrick. However,
certain amount of configuration/tuning effort is required to match peak
performance. Based on our investigation, certain default parameters in CM
for YARN (e.g., max container size) should be set differently to minimize
config effort.

Thanks,
Nishkam


On Fri, Apr 11, 2014 at 4:36 PM, Mayur Rustagi <mayur.rust...@gmail.com>wrote:

> I am using Mesos right now & it works great. Mesos has fine grained as
> well as coarse grained allocation & really useful for prioritizing
> different pipelines.
>  On Apr 11, 2014 1:19 PM, "Patrick Wendell" <pwend...@gmail.com> wrote:
>
>> To reiterate what Tom was saying - the code that runs inside of Spark on
>> YARN is exactly the same code that runs in any deployment mode. There
>> shouldn't be any performance difference once your application starts
>> (assuming you are comparing apples-to-apples in terms of hardware).
>>
>> The differences are just that before your application runs, Spark
>> allocates resources from YARN. This will probably take more time than
>> launching an application against a standalone cluster because YARN's
>> launching mechanism is slower.
>>
>>
>> On Fri, Apr 11, 2014 at 8:43 AM, Tom Graves <tgraves...@yahoo.com> wrote:
>>
>>> I haven't run on mesos before, but I do run on yarn. The performance
>>> differences are going to be in how long it takes you go get the Executors
>>> allocated.  On yarn that is going to depend on the cluster setup. If you
>>> have dedicated resources to a queue where you are running your spark job
>>> the overhead is pretty minimal.  Now if your cluster is multi-tenant and is
>>> really busy and you allow other queues are using your capacity it could
>>> take some time.  It is also possible to run into the situation where the
>>> memory of the nodemanagers get fragmented and you don't have any slots big
>>> enough for you so you have to wait for other applications to finish.  Again
>>> this mostly depends on the setup, how big of containers you need for Spark,
>>> etc.
>>>
>>> Tom
>>>    On Thursday, April 10, 2014 11:12 AM, Flavio Pompermaier <
>>> pomperma...@okkam.it> wrote:
>>>   Thank you for the reply Mayur, it would be nice to have a comparison
>>> about that.
>>> I hope one day it will be available, or to have the time to test it
>>> myself :)
>>> So you're using Mesos for the moment, right? Which are the main
>>> differences in you experience? YARN seems to be more flexible and
>>> interoperable with other frameworks..am I wrong?
>>>
>>> Best,
>>> Flavio
>>>
>>>
>>> On Thu, Apr 10, 2014 at 5:55 PM, Mayur Rustagi 
>>> <mayur.rust...@gmail.com>wrote:
>>>
>>> I've had better luck with standalone in terms of speed & latency. I
>>> think thr is impact but not really very high. Bigger impact is towards
>>> being able to manage resources & share cluster.
>>>
>>> Mayur Rustagi
>>> Ph: +1 (760) 203 3257
>>> http://www.sigmoidanalytics.com
>>> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>>>
>>>
>>>
>>> On Wed, Apr 9, 2014 at 12:10 AM, Flavio Pompermaier <
>>> pomperma...@okkam.it> wrote:
>>>
>>> Hi to everybody,
>>> I'm new to Spark and I'd like to know if running Spark on top of YARN or
>>> Mesos could affect (and how much) its performance. Is there any doc about
>>> this?
>>>
>>> Best,
>>> Flavio
>>>
>>>
>>>
>>>
>>

Reply via email to