Folks,
We are running a streaming job on a cluster that is failing with:
'ERROR BlockGenerator: Error in block pushing thread' via a Futures timeout.
The cluster is running on EC2. Data is fed in via a Kafka client. I have
been looking for network communications issues but haven't found any smoking
gun. In a check of the docs the only 10 second timer I have found is
spark.executor.heartbeatInterval. I'd rather not increase this much
further.
Has anyone encountered this issue or a similar one before?
Relevant info:
The cluster is running in standalone mode with Spark 1.1.2 on a series of
AWS
EC2 nodes.
Spark: 1.1.2
Java: 1.8.0_25-b17
HDFS: supplied by cdh5.3.0
Stack Trace:
15/01/25 06:15:30 ERROR BlockGenerator: Error in block pushing thread
org.apache.spark.SparkException: Error sending message [message =
UpdateBlockInfo(BlockManagerId(0, XXXXX-node-0002, 33603,
0),input-3-1422166491400,StorageLevel(false, true, false, false,
1),1736759,0,0)]
at
org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:190)
at
org.apache.spark.storage.BlockManagerMaster.askDriverWithReply(BlockManagerMaster.scala:213)
at
org.apache.spark.storage.BlockManagerMaster.updateBlockInfo(BlockManagerMaster.scala:58)
at org.apache.spark.storage.BlockManager.org
$apache$spark$storage$BlockManager$$tryToReportBlockStatus(BlockManager.scala:315)
at
org.apache.spark.storage.BlockManager.reportBlockStatus(BlockManager.scala:291)
at
org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:774)
at
org.apache.spark.storage.BlockManager.putArray(BlockManager.scala:630)
at
org.apache.spark.streaming.receiver.ReceiverSupervisorImpl.pushArrayBuffer(ReceiverSupervisorImpl.scala:113)
at
org.apache.spark.streaming.receiver.ReceiverSupervisorImpl$$anon$2.onPushBlock(ReceiverSupervisorImpl.scala:96)
at
org.apache.spark.streaming.receiver.BlockGenerator.pushBlock(BlockGenerator.scala:140)
at org.apache.spark.streaming.receiver.BlockGenerator.org
$apache$spark$streaming$receiver$BlockGenerator$$keepPushingBlocks(BlockGenerator.scala:113)
at
org.apache.spark.streaming.receiver.BlockGenerator$$anon$1.run(BlockGenerator.scala:57)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after
[10 seconds]
at
scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at
org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176)
... 11 more
Thanks,
Matthew