Thank you, that makes sense.

My use case is to match Spark dataframe functionality using only C# if
possible, without using Spark

Specifically we have CSV files we wish to load into the cache and then we
have compute functions that act on those rows, adding columns as they do, so
the cache will be heavy on read/write

To try and improve the initial cache population from file(which can be
millions of rows) I distribute a job to the cluster that each reads a piece
of the file to get some sort of upload parallelization.

I am using affinity keys so that the calculations only have to process the
data on the node they run on, which works fine. But then I thought,
performance would probably improve on the cache population step if i just
used LOCAL caches. Its the same end result, calculations working off only
the data they have on the node. I can maybe live with the downsides of local
cache, which i assume include no fault tolerance or load balancing, if the
speed improvements make it worthwhile.

Anyway, basically to get my desired functionality I have 2 options - either
use affinity keys and affinity compute OR use local caches and broadcast
compute.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Reply via email to