OK. It is clear. But what about collect() and collectAsMap()? Is it possible that Spark throws 'java heap space' error or 'communication error' because of a small spark.akka.framesize? Currently I set it as 1024. Thank you!
Best, Shangyu 2013/12/8 Matei Zaharia <matei.zaha...@gmail.com> > As I said, it should not affect performance of transformations on RDDs, > only of sending tasks to the workers and getting results back. In general, > you want the Akka frame size to be as small as possible while still holding > your largest task or result; as long as your application isn’t throwing an > error due to the frame size being too small, you’re fine. Having a bigger > frame size will result in wasted space and unneeded memory allocation for > buffers. It doesn’t make the communication more efficient. > > Matei > > > On Dec 8, 2013, at 12:57 PM, Shangyu Luo <lsy...@gmail.com> wrote: > > I would like to know the maximum value for spark.akka.framesize, too and I > am wondering if it will affect the performance of reduceByKey(). > Thanks! > > > 2013/12/8 Matei Zaharia <matei.zaha...@gmail.com> > >> Hey Matt, >> >> This setting shouldn’t really affect groupBy operations, because they >> don’t go through Akka. The frame size setting is for messages from the >> master to workers (specifically, sending out tasks), and for results that >> go directly from workers to the application (e.g. collect()). So it >> shouldn’t be a problem unless these are large. In Spark 0.8.1, results back >> to the master will be sent in a different way if they’re large, so the >> setting will only cover task sizes. >> >> Matei >> >> On Dec 7, 2013, at 10:20 PM, Matt Cheah <mch...@palantir.com> wrote: >> >> Hi everyone, >> >> I'm noticing like others that group-By operations with large sized >> groups gives Spark some trouble. Increasing the spark.akka.frameSize >> property alleviates it up to a point. >> >> I was wondering what the maximum setting for this value is. I've seen >> previous e-mails talking about the ramifications of turning up this value, >> but I was wondering what the actual maximum number that could be set for it >> is. I'll benchmark the performance hit accordingly. >> >> Thanks! >> >> -Matt Cheah >> >> >> > > > -- > -- > > Shangyu, Luo > > > -- -- Shangyu, Luo