OK. It is clear.
But what about collect() and collectAsMap()? Is it possible that Spark
throws 'java heap space' error or 'communication error' because of a small
spark.akka.framesize? Currently I set it as 1024.
Thank you!

Best,
Shangyu


2013/12/8 Matei Zaharia <matei.zaha...@gmail.com>

> As I said, it should not affect performance of transformations on RDDs,
> only of sending tasks to the workers and getting results back. In general,
> you want the Akka frame size to be as small as possible while still holding
> your largest task or result; as long as your application isn’t throwing an
> error due to the frame size being too small, you’re fine. Having a bigger
> frame size will result in wasted space and unneeded memory allocation for
> buffers. It doesn’t make the communication more efficient.
>
> Matei
>
>
> On Dec 8, 2013, at 12:57 PM, Shangyu Luo <lsy...@gmail.com> wrote:
>
> I would like to know the maximum value for spark.akka.framesize, too and I
> am wondering if it will affect the performance of reduceByKey().
> Thanks!
>
>
> 2013/12/8 Matei Zaharia <matei.zaha...@gmail.com>
>
>> Hey Matt,
>>
>> This setting shouldn’t really affect groupBy operations, because they
>> don’t go through Akka. The frame size setting is for messages from the
>> master to workers (specifically, sending out tasks), and for results that
>> go directly from workers to the application (e.g. collect()). So it
>> shouldn’t be a problem unless these are large. In Spark 0.8.1, results back
>> to the master will be sent in a different way if they’re large, so the
>> setting will only cover task sizes.
>>
>> Matei
>>
>> On Dec 7, 2013, at 10:20 PM, Matt Cheah <mch...@palantir.com> wrote:
>>
>>  Hi everyone,
>>
>>  I'm noticing like others that group-By operations with large sized
>> groups gives Spark some trouble. Increasing the spark.akka.frameSize
>> property alleviates it up to a point.
>>
>>  I was wondering what the maximum setting for this value is. I've seen
>> previous e-mails talking about the ramifications of turning up this value,
>> but I was wondering what the actual maximum number that could be set for it
>> is. I'll benchmark the performance hit accordingly.
>>
>>  Thanks!
>>
>>  -Matt Cheah
>>
>>
>>
>
>
> --
> --
>
> Shangyu, Luo
>
>
>


-- 
--

Shangyu, Luo

Reply via email to