Hey Matt,
This setting shouldn’t really affect groupBy operations, because they don’t go
through Akka. The frame size setting is for messages from the master to workers
(specifically, sending out tasks), and for results that go directly from
workers to the application (e.g. collect()). So it
As I said, it should not affect performance of transformations on RDDs, only of
sending tasks to the workers and getting results back. In general, you want the
Akka frame size to be as small as possible while still holding your largest
task or result; as long as your application isn’t throwing
OK. It is clear.
But what about collect() and collectAsMap()? Is it possible that Spark
throws 'java heap space' error or 'communication error' because of a small
spark.akka.framesize? Currently I set it as 1024.
Thank you!
Best,
Shangyu
2013/12/8 Matei Zaharia matei.zaha...@gmail.com
As I
Hi everyone,
I'm noticing like others that group-By operations with large sized groups gives
Spark some trouble. Increasing the spark.akka.frameSize property alleviates it
up to a point.
I was wondering what the maximum setting for this value is. I've seen previous
e-mails talking about the