Hi all,

We are running a Standalone Spark Cluster for running a streaming
application. The application consumes data from Flume using a Flume Polling
stream created as such :

flumeStream = FlumeUtils.createPollingStream(streamingContext,
    socketAddress.toArray(new InetSocketAddress[socketAddress.size()]),
    StorageLevel.MEMORY_AND_DISK_SER(), *100*, *5*);


The checkpoint directory is configured to be on an HDFS cluster and Spark
workers have their SPARK_LOCAL_DIRS and SPARK_WORKER_DIR defined to be on
their respective local filesystems.

What we are seeing is some odd behavior and unable to explain. During
normal operation, everything runs as expected with flume delivering events
to Spark. However, while running, if I kill one of the HDFS nodes (does not
matter which one), the Flume Receiver in Spark stops producing any data to
the data processing.

I enabled debug logging for org.apache.spark.streaming.flume on Spark
worker nodes and looked at the logs for the one that gets to run the Flume
Receiver and it keeps chugging along receiving data from Flume as shown in
a sample of the log below, but the resulting batches in the Stream start
receiving 0 records soon as the HDFS node is killed, with no errors being
produced to indicate any issue.

*17/06/20 01:05:42 DEBUG FlumeBatchFetcher: Ack sent for sequence number:
09fa05f59050*
*17/06/20 01:05:44 DEBUG FlumeBatchFetcher: Received batch of 100 events
with sequence number: 09fa05f59052*
*17/06/20 01:05:44 DEBUG FlumeBatchFetcher: Sending ack for sequence
number: 09fa05f59052*
*17/06/20 01:05:44 DEBUG FlumeBatchFetcher: Ack sent for sequence number:
09fa05f59052*
*17/06/20 01:05:47 DEBUG FlumeBatchFetcher: Received batch of 100 events
with sequence number: 09fa05f59054*
*17/06/20 01:05:47 DEBUG FlumeBatchFetcher: Sending ack for sequence
number: 09fa05f59054*

The driver output for the application shows (printed via
Dstream.count().map().print()):

-------------------------------------------
Time: 1497920770000 ms
-------------------------------------------
Received 0     flume events.


Any insights about where to look in order to find the root cause will be
greatly appreciated.

Thanks
N B

Reply via email to