The challenge is that by using large numbers of records in a single putAll 
method, you’re effectively creating a huge transaction. Transactions require 
distributed locks, which are expensive.

You’re right that batching can improve throughput (but not latency). That’s 
what the data streamer does. This is a blog showing a similar approach:

https://www.gridgain.com/resources/blog/how-fast-load-large-datasets-apache-ignite-using-key-value-api
 
<https://www.gridgain.com/resources/blog/how-fast-load-large-datasets-apache-ignite-using-key-value-api>

(The code is Java but the approach should work for C++.)

> On 26 Apr 2021, at 09:03, jjimeno <jjim...@omp.com> wrote:
> 
> Hi,
> 
> I have the same feeling, but I think that shouldn't be the case.  Small
> number of big batches should decrease the total latency time while would
> favor the total throughput. And, as Ilya said:
> 
> "In a distributed system, throughput will scale with cluster growth, but
> latency will be steady or become slightly worse."
> 
> the effects of scaling the cluster should be clearer using a few big batches
> rather than a lot of tiny ones, at least in my understanding.
> 
> Unfortunately, Data Streamer is not yet supported in the C++ API, afaik.
> 
> 
> 
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Reply via email to