Hi Guys, We need to do some state checkpointing (an rdd thats updated using updateStateByKey). We would like finer control over the serialization. Also, this would allow us to do schema evolution in the deserialization code when we need to modify the structure of the classes associated with the state.
I guess I can do foreachRDD and write to any location (either to a blob store or a dynamo). A) How I can make the checkpoint recovery read data from this persisted location. B) I notice that calling checkpoint cleans up older versions of the checkpoint. Where should i be writing this cleanup code. C) My understanding is that checkpointing is atomic. Is there anything I need to be aware of to not loose the atomicity semantics. Thanks, Arun