Re: MapReduce - How to efficiently scan through subset of the caches?

Pavel Tupitsyn Fri, 23 Apr 2021 23:23:50 -0700

> For Compute.affinityRun, I am not sure how to work with it for my scenario


affinityRun and affinityCall have partition-based overloads (taking int
partId).
Partition-based compute is the reliable way to process all data in the
cluster,
even in the face of topology changes/rebalance (as opposed to localEntries
or local queries).

The whole thing can look like this:

1. From the initiator node, start processing all partitions in parallel
  for (int part = 0; part < ignite.affinity().partitions(); i++)
      var fut = ignite.compute().affinityCallAsync(cacheNames, part, new
MyJob(part));

2. Inside MyJob, find tenant data with SQL
    var entries = cache.query(new SqlFieldsQuery().setPartitions(part)...);

3. Still inside MyJob, process the data in any way, return results from the
job
    return process(entries);

4. Aggregate job results on the initiator


Here, Ignite guarantees that steps 2 and 3:
- Operate on local data (job runs on the node where the partition is
located)
- The partition is locked while job runs, so the data won't be moved in
case of topology changes

On Sat, Apr 24, 2021 at 3:12 AM William.L <[email protected]> wrote:

> Thanks for the pointers stephendarlington, ptupitsyn.
>
> Looks like I can run a mapper that does a local SQL query to get the set of
> keys for the tenant (that resides on the local server node), and then do
> Compute.affinityRun or Cache.invokeAll.
>
> For Cache.invokeAll, it takes a dictionary of keys to EntryProcessor so
> that
> is easy to understand.
>
> For Compute.affinityRun, I am not sure how to work with it for my scenario:
> * It takes an affinity key to find the partition's server to run the
> IgniteRunnable but I don't see an interface to pass in the specific keys?
> Am
> I expected to pass the key set as part of IgniteRunnable object?
> * Suppose the cache use user_id as the affinity key then it is possible
> that
> 2 user_id will map to the same partition. How do I avoid duplicate
> processing/scanning?
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: MapReduce - How to efficiently scan through subset of the caches?

Reply via email to