I've instrumented checkpointing per the programming guide and I can tell
that Spark Streaming is creating the checkpoint directories but I'm not
seeing any content being created in those directories nor am I seeing the
effects I'd expect from checkpointing.  I'd expect any data that comes into
Kafka while the consumers are down, to get picked up when the consumers are
restarted; I'm not seeing that.

For now my checkpoint directory is set to the local file system with the
directory URI being in this form:   file:///mnt/dir1/dir2.  I see a
subdirectory named with a UUID being created under there but no files.

I'm using a custom JavaStreamingContextFactory which creates a
JavaStreamingContext with the directory set into it via the
checkpoint(String) method.

I'm currently not invoking the checkpoint(Duration) method on the DStream
since I want to first rely on Spark's default checkpointing interval.  My
streaming batch duration millis is set to 1 second.

Anyone have any idea what might be going wrong?

Also, at which point does Spark delete files from checkpointing?

Thanks.

Reply via email to