This usually happens when one of the worker is stuck on GC Pause and it times out. Enable the following configurations while creating sparkContext:
sc.set("spark.rdd.compress","true") sc.set("spark.storage.memoryFraction","1") sc.set("spark.core.connection.ack.wait.timeout","6000") sc.set("spark.akka.frameSize","100") Thanks Best Regards On Sat, Nov 15, 2014 at 12:46 AM, Ganelin, Ilya <ilya.gane...@capitalone.com > wrote: > Hello all. I have been running a Spark Job that eventually needs to do a > large join. > > 24 million x 150 million > > A broadcast join is infeasible in this instance clearly, so I am instead > attempting to do it with Hash Partitioning by defining a custom partitioner > as: > > > class RDD2Partitioner(partitions: Int) extends HashPartitioner(partitions) { > > override def getPartition(key: Any): Int = key match { > case k: Tuple2[Int, String] => super.getPartition(k._1) > case _ => super.getPartition(key) > } > > } > > I then partition both arrays using this partitioner. However, the job > eventually fails with the following exception which if I had to guess > indicated that a network connection was interrupted during the shuffle stage, > causing things to get lost and ultimately resulting in a fetch failure: > > 14/11/14 12:56:21 INFO ConnectionManager: Removing ReceivingConnection to > ConnectionManagerId(innovationdatanode08.cof.ds.capitalone.com,37590) > 14/11/14 12:56:21 INFO ConnectionManager: Key not valid ? > sun.nio.ch.SelectionKeyImpl@7369b398 > 14/11/14 12:56:21 INFO ConnectionManager: key already cancelled ? > sun.nio.ch.SelectionKeyImpl@7369b398 > java.nio.channels.CancelledKeyException > at > org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:386) > at > org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:139) > > > In the spark UI, I still see a substantial amount of shuffling going on at > this stage, I am wondering if I’m perhaps using the partitioner incorrectly? > > > ------------------------------ > > The information contained in this e-mail is confidential and/or > proprietary to Capital One and/or its affiliates. The information > transmitted herewith is intended only for use by the individual or entity > to which it is addressed. If the reader of this message is not the > intended recipient, you are hereby notified that any review, > retransmission, dissemination, distribution, copying or other use of, or > taking of any action in reliance upon this information is strictly > prohibited. If you have received this communication in error, please > contact the sender and delete the material from your computer. >