Ok,
I think it was premature alert :)

1. We have a framework guarantee that start method will be called only once
per SplitEnumerator instance, hence context.callAsync will be called only
once
2. callAsync uses ScheduledExecutorService::scheduleAtFixedRate under the
hood so If any execution of this task takes longer than its period, then
subsequent executions may start late, but will not concurrently execute.

So I guess we are safe here, it will be executed task, by task and
sensationally passed back to handleNewSplits method.


śr., 19 sty 2022 o 16:16 Krzysztof Chmielewski <
krzysiek.chmielew...@gmail.com> napisał(a):

> Hi,
> in documentation for SplitEnumeratorContext::callAsync method we read that:
>
> "(...)  When this method is invoked multiple times, The Callables may be
> executed in a thread pool concurrently.
> It is important to make sure that the callable does not modify any shared
> state, especially the states that will be a part of the
> SplitEnumerator.snapshotState(long) (...)"
>
> The ContinuousHiveSplitEnumerator::start method exectutes below code:
>
> enumeratorContext.callAsync(
>         monitor, this::handleNewSplits, discoveryInterval, discoveryInterval);
>
> The handleNewSplits method does this:
>
> this.seenPartitionsSinceOffset = newSplitsAndState.seenPartitions;
>
> My question is, on how many threads "monitor" callable will be executed?
> If more than one (and this is possible accordingly to the callAsync
> documentation) then
> I think that this reference assignment in handleNewSplits method could
> lead to bugs. Since both threads that were executing monitor callable can
> return totally different collection.
>
> For ContinuousFileSplitEnumerator this was resolved in a different way.
> Filtering of already processed paths is done by Source-Coordinator thread
> and there is no reference assignment.
>
> Regards,
> Krzysztof Chmielewski
>
>

Reply via email to