>From within a Spark job you can use a Periodic Listener: ssc.addStreamingListener(PeriodicStatisticsListener(Seconds(60)))
class PeriodicStatisticsListener(timePeriod: Duration) extends StreamingListener { private val logger = LoggerFactory.getLogger("Application") override def onBatchCompleted(batchCompleted : org.apache.spark.streaming.scheduler.StreamingListenerBatchCompleted) : scala.Unit = { if(startTime == Time(0)) { startTime = batchCompleted.batchInfo.batchTime } logger.info("Batch Complete @ " + new DateTime(batchCompleted.batchInfo.batchTime.milliseconds).withZone(DateTimeZone.UTC) + " (" + batchCompleted.batchInfo.batchTime + ")" + " with records " + batchCompleted.batchInfo.numRecords + " in processing time " + batchCompleted.batchInfo.processingDelay.getOrElse(0.toLong) / 1000 + " seconds") } On Mon, Feb 8, 2016 at 11:34 AM, Chen Song <chen.song...@gmail.com> wrote: > Apologize in advance if someone has already asked and addressed this > question. > > In Spark Streaming, how can I programmatically get the batch statistics > like schedule delay, total delay and processing time (They are shown in the > job UI streaming tab)? I need such information to raise alerts in some > circumstances. For example, if the scheduling is delayed more than a > threshold. > > Thanks, > Chen > >