Hi,
it seems like I am unable to shut down my StreamingContext properly, both
in local[n] and yarn-cluster mode. In addition, (only) in yarn-cluster
mode, subsequent use of a new StreamingContext will raise
an InvalidActorNameException.
I use
logger.info("stoppingStreamingContext")
staticStreamingContext.stop(stopSparkContext=false,
stopGracefully=true)
logger.debug("done")
and have in my output logs
19:16:47.708 [ForkJoinPool-2-worker-11] INFO stopping StreamingContext
[... output from other threads ...]
19:17:07.729 [ForkJoinPool-2-worker-11] WARN scheduler.JobGenerator -
Timed out while stopping the job generator (timeout = 20000)
19:17:07.739 [ForkJoinPool-2-worker-11] DEBUG done
The processing itself is complete, i.e., the batch currently processed at
the time of stop() is finished and no further batches are processed.
However, something keeps the streaming context from stopping properly. In
local[n] mode, this is not actually a problem (other than I have to wait 20
seconds for shutdown), but in yarn-cluster mode, I get an error
akka.actor.InvalidActorNameException: actor name [JobGenerator] is not
unique!
when I start a (newly created) StreamingContext, and I was wondering what
* is the issue with stop()
* is the difference between local[n] and yarn-cluster mode.
Some possible reasons:
* On my executors, I use a networking library that depends on netty and
doesn't properly shut down the event loop. (That has not been a problem in
the past, though.)
* I have a non-empty state (from using updateStateByKey()) that is
checkpointed to /tmp/spark (in local mode) and hdfs:///tmp/spark (in
yarn-cluster) mode, could that be an issue? (In fact, I have not seen this
error in any non-stateful stream applications before.)
Any help much appreciated!
Thanks
Tobias