>> when you use getOrCreate, and there exists a valid checkpoint, it will
always return the context from the checkpoint and not call the factory.
Simple way to see whats going on is to print something in the factory to
verify whether it is ever called.

This is probably OK. Seems to explain why we were getting a sticky batch
duration millis value. Once I blew away all the checkpointing directories
and unplugged the data checkpointing (while keeping the metadata
checkpointing) the batch duration millis was no longer sticky.

So, there doesn't seem to be a way for metadata checkpointing to override
its checkpoint duration millis, is there?  Is the default there
max(batchdurationmillis, 10seconds)?  Is there a way to override this?
Thanks.





On Wed, Sep 9, 2015 at 2:44 PM, Tathagata Das <t...@databricks.com> wrote:

>
>
> See inline.
>
> On Tue, Sep 8, 2015 at 9:02 PM, Dmitry Goldenberg <
> dgoldenberg...@gmail.com> wrote:
>
>> What's wrong with creating a checkpointed context??  We WANT
>> checkpointing, first of all.  We therefore WANT the checkpointed context.
>>
>> Second of all, it's not true that we're loading the checkpointed context
>> independent of whether params.isCheckpointed() is true.  I'm quoting the
>> code again:
>>
>> // This is NOT loading a checkpointed context if isCheckpointed() is
>> false.
>> JavaStreamingContext jssc = params.isCheckpointed() ?
>> createCheckpointedContext(sparkConf, params) : createContext(sparkConf,
>> params);
>>
>>   private JavaStreamingContext createCheckpointedContext(SparkConf
>> sparkConf, Parameters params) {
>>     JavaStreamingContextFactory factory = new
>> JavaStreamingContextFactory() {
>>       @Override
>>       public JavaStreamingContext create() {
>>         return createContext(sparkConf, params);
>>       }
>>     };
>>     return *JavaStreamingContext.getOrCreate(params.getCheckpointDir(),
>> factory);*
>>
> ^^^^^   when you use getOrCreate, and there exists a valid checkpoint, it
> will always return the context from the checkpoint and not call the
> factory. Simple way to see whats going on is to print something in the
> factory to verify whether it is ever called.
>
>
>
>
>
>>   }
>>
>>   private JavaStreamingContext createContext(SparkConf sparkConf,
>> Parameters params) {
>>     // Create context with the specified batch interval, in milliseconds.
>>     JavaStreamingContext jssc = new JavaStreamingContext(sparkConf,
>> Durations.milliseconds(params.getBatchDurationMillis()));
>>     // Set the checkpoint directory, if we're checkpointing
>>     if (params.isCheckpointed()) {
>>       jssc.checkpoint(params.getCheckpointDir());
>>
>>     }
>> ...............
>> Again, this is *only* calling context.checkpoint() if isCheckpointed() is
>> true.  And we WANT it to be true.
>>
>> What am I missing here?
>>
>>
>>

Reply via email to