Hi,

You are right about the available options in terms of lack of flexibility
of approaching this problem.

How have you partitioned your data? You can probably have a local variable
specific to each node or compute thread (or maybe use an Atomic Type -
https://apacheignite.readme.io/docs/atomic-types but may reduce
performance) that keeps track of the keys that it has already processed.
Within your scan query filter, do a contains operation on this set of
already processed keys. The next time your scan query throws an exception,
you would already have an idea about which keys have been processed.

HTH,
RH
https://www.apacheignitetutorial.com/


On Thu, Jan 24, 2019 at 11:03 AM msuh <m...@jobcase.com> wrote:

> Hello,
>
> Our end production cluster would be working with many billions of entities
> in many caches, and have use cases where we need to run ScanQuery over an
> entire cache to update certain fields.
>
> We expect that there could definitely be failures in the middle of a single
> ScanQuery due to the sheer size of the caches. Since we wouldn't want to
> rerun ScanQuery from the start, we're wondering if we could keep some
> checkpoint of up to which point we've processed in the QueryCursor. The
> QueryCursor API doesn't seem to show any methods that allow that, but I may
> not be looking at the right place? Would there be any other efficient ways
> to keep track of vaguely up to which point we've processed? If QueryCursor
> doesn't provide anything externally, would partition number be the best
> option?
>
> But from what I've seen, it seemed like entities in partitions shift around
> (from rebalancing or something?), so not sure if that's even possible.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Reply via email to