"Timed out while stopping the job generator" plus subsequent failures

Tobias Pfeiffer Wed, 11 Mar 2015 03:45:08 -0700

Hi,

it seems like I am unable to shut down my StreamingContext properly, both
in local[n] and yarn-cluster mode. In addition, (only) in yarn-cluster
mode, subsequent use of a new StreamingContext will raise
an InvalidActorNameException.


I use

  logger.info("stoppingStreamingContext")
  staticStreamingContext.stop(stopSparkContext=false,
    stopGracefully=true)
  logger.debug("done")

and have in my output logs

  19:16:47.708 [ForkJoinPool-2-worker-11] INFO  stopping StreamingContext
    [... output from other threads ...]
  19:17:07.729 [ForkJoinPool-2-worker-11] WARN  scheduler.JobGenerator -
Timed out while stopping the job generator (timeout = 20000)
  19:17:07.739 [ForkJoinPool-2-worker-11] DEBUG done

The processing itself is complete, i.e., the batch currently processed at
the time of stop() is finished and no further batches are processed.
However, something keeps the streaming context from stopping properly. In
local[n] mode, this is not actually a problem (other than I have to wait 20
seconds for shutdown), but in yarn-cluster mode, I get an error

  akka.actor.InvalidActorNameException: actor name [JobGenerator] is not
unique!

when I start a (newly created) StreamingContext, and I was wondering what
* is the issue with stop()
* is the difference between local[n] and yarn-cluster mode.

Some possible reasons:
* On my executors, I use a networking library that depends on netty and
doesn't properly shut down the event loop. (That has not been a problem in
the past, though.)
* I have a non-empty state (from using updateStateByKey()) that is
checkpointed to /tmp/spark (in local mode) and hdfs:///tmp/spark (in
yarn-cluster) mode, could that be an issue? (In fact, I have not seen this
error in any non-stateful stream applications before.)

Any help much appreciated!

Thanks
Tobias

"Timed out while stopping the job generator" plus subsequent failures

Reply via email to