MapReduce - How to efficiently scan through subset of the caches?

William.L Fri, 23 Apr 2021 09:16:55 -0700

Hi,

I am investigating whether the MapReduce API is the right tool for my
scenario. Here's the context of the caches:
* Multiple caches for different type of dataset
* Each cache has multi-tenant data and the tenant id is part of the cache
key
* Each cache entry is a complex json/binary object that I want to do
computation on (let's just say it is hard to do it in SQL) and return some
complex results for each entry (e.g. a dictionary) that I want to do
reduce/aggregation on.
* The cluster is persistence enabled because we have more data then memory


My scenario is to do the MapReduce operation only on data for a specific
tenant (small subset of the data). From reading the forum about MapReduce,
it seems like the best way to do this is using the IgniteCache.localEntries
API and iterate through the node's local cache. My concern with this
approach is that we are looping through the whole cache (K&V) which is very
inefficient. Is there a more efficient way to filter only the relevant keys
and then access the matching entries only?

Thanks.




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

MapReduce - How to efficiently scan through subset of the caches?

Reply via email to