I've instrumented checkpointing per the programming guide and I can tell that Spark Streaming is creating the checkpoint directories but I'm not seeing any content being created in those directories nor am I seeing the effects I'd expect from checkpointing. I'd expect any data that comes into Kafka while the consumers are down, to get picked up when the consumers are restarted; I'm not seeing that.
For now my checkpoint directory is set to the local file system with the directory URI being in this form: file:///mnt/dir1/dir2. I see a subdirectory named with a UUID being created under there but no files. I'm using a custom JavaStreamingContextFactory which creates a JavaStreamingContext with the directory set into it via the checkpoint(String) method. I'm currently not invoking the checkpoint(Duration) method on the DStream since I want to first rely on Spark's default checkpointing interval. My streaming batch duration millis is set to 1 second. Anyone have any idea what might be going wrong? Also, at which point does Spark delete files from checkpointing? Thanks.