Re: Biggest spark.akka.framesize possible

2013-12-08 Thread Matei Zaharia
Hey Matt, This setting shouldn’t really affect groupBy operations, because they don’t go through Akka. The frame size setting is for messages from the master to workers (specifically, sending out tasks), and for results that go directly from workers to the application (e.g. collect()). So it

Re: Biggest spark.akka.framesize possible

2013-12-08 Thread Matei Zaharia
As I said, it should not affect performance of transformations on RDDs, only of sending tasks to the workers and getting results back. In general, you want the Akka frame size to be as small as possible while still holding your largest task or result; as long as your application isn’t throwing

Re: Biggest spark.akka.framesize possible

2013-12-08 Thread Shangyu Luo
OK. It is clear. But what about collect() and collectAsMap()? Is it possible that Spark throws 'java heap space' error or 'communication error' because of a small spark.akka.framesize? Currently I set it as 1024. Thank you! Best, Shangyu 2013/12/8 Matei Zaharia matei.zaha...@gmail.com As I

Biggest spark.akka.framesize possible

2013-12-07 Thread Matt Cheah
Hi everyone, I'm noticing like others that group-By operations with large sized groups gives Spark some trouble. Increasing the spark.akka.frameSize property alleviates it up to a point. I was wondering what the maximum setting for this value is. I've seen previous e-mails talking about the