Re: MapReduce - How to efficiently scan through subset of the caches?

Pavel Tupitsyn Fri, 23 Apr 2021 09:35:13 -0700

1. Use a separate cache as an index.
   E.g. for every tenant store a list of IDs for quick retrieval,
   then use Compute.affinityRun or Cache.invokeAll to process the subset of
data


2. Use SQL with index, but enable it only for the tenantId field.
    Get entry IDs for a given tenant with SQL, then again AffinityRun or
InvokeAll

> IgniteCache.localEntries
Be careful with localEntries - when topology changes and rebalance is in
progress,
you'll miss some data and/or process some of it twice.

Prefer Cache.invoke and Compute.affinity* APIs - they provide guarantees
that the given part of data (key or partition) is locked during the
processing.

On Fri, Apr 23, 2021 at 7:17 PM William.L <[email protected]> wrote:

> Hi,
>
> I am investigating whether the MapReduce API is the right tool for my
> scenario. Here's the context of the caches:
> * Multiple caches for different type of dataset
> * Each cache has multi-tenant data and the tenant id is part of the cache
> key
> * Each cache entry is a complex json/binary object that I want to do
> computation on (let's just say it is hard to do it in SQL) and return some
> complex results for each entry (e.g. a dictionary) that I want to do
> reduce/aggregation on.
> * The cluster is persistence enabled because we have more data then memory
>
> My scenario is to do the MapReduce operation only on data for a specific
> tenant (small subset of the data). From reading the forum about MapReduce,
> it seems like the best way to do this is using the IgniteCache.localEntries
> API and iterate through the node's local cache. My concern with this
> approach is that we are looping through the whole cache (K&V) which is very
> inefficient. Is there a more efficient way to filter only the relevant keys
> and then access the matching entries only?
>
> Thanks.
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: MapReduce - How to efficiently scan through subset of the caches?

Reply via email to