groupByKey is taking more time

2014-03-28 Thread mohit.goyal
Hi, I have two RDD RDD1=K1,V1 RDD2=K1,V1 e.g-(1,List("A","B","C")),(1,List("D","E","F")) RDD1.groupByKey(RDD2) Where K1=Integer V1=List of String If I keep size of V1=3(list of three strings). The groupByKey operation takes 2.6 m and If I keep size of V1=20(list of 20 Strings). The groupByKe

Sprak Job stuck

2014-03-20 Thread mohit.goyal
Hi, I have run the spark application to process input data of size ~14GB with executor memory 10GB. The job got stuck with below message 14/03/21 05:02:07 WARN storage.BlockManagerMasterActor: Removing BlockManager BlockManagerId(0, guavus-0102bf, 49347, 0) with no recent heart beats: 85563ms exc