I used to hit this issue when my data size was too large and the number of partitions was too large ( > 1200 ), I got ride of it by
- Reducing the number of partitions - Setting the following while creating the sparkContext: .set("spark.rdd.compress","true") .set("spark.storage.memoryFraction","1") .set("spark.core.connection.ack.wait.timeout","600") .set("spark.akka.frameSize","50") Thanks Best Regards On Sun, Oct 19, 2014 at 6:52 AM, marylucy <qaz163wsx_...@hotmail.com> wrote: > When doing groupby for big data,may be 500g,some partition tasks > success,some partition tasks fetchfailed error. Spark system retry > previous stage,but always fail > 6 computers : 384g > Worker:40g*7 for one computer > > Can anyone tell me why fetch failed??? > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >