The challenge is that by using large numbers of records in a single putAll method, you’re effectively creating a huge transaction. Transactions require distributed locks, which are expensive.
You’re right that batching can improve throughput (but not latency). That’s what the data streamer does. This is a blog showing a similar approach: https://www.gridgain.com/resources/blog/how-fast-load-large-datasets-apache-ignite-using-key-value-api <https://www.gridgain.com/resources/blog/how-fast-load-large-datasets-apache-ignite-using-key-value-api> (The code is Java but the approach should work for C++.) > On 26 Apr 2021, at 09:03, jjimeno <jjim...@omp.com> wrote: > > Hi, > > I have the same feeling, but I think that shouldn't be the case. Small > number of big batches should decrease the total latency time while would > favor the total throughput. And, as Ilya said: > > "In a distributed system, throughput will scale with cluster growth, but > latency will be steady or become slightly worse." > > the effects of scaling the cluster should be clearer using a few big batches > rather than a lot of tiny ones, at least in my understanding. > > Unfortunately, Data Streamer is not yet supported in the C++ API, afaik. > > > > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/