batch mutates & throughput

Philippe Sun, 07 Aug 2011 14:06:59 -0700

A question regarding batch mutates and how others might be throttling the
system to prevent timeouts.


My 3-node, RF=3 cluster has been performing ok while bulk loading data
(applying counter updates). I've been able to run 16 threads in parallel
that each perform about 400 mutates/s on a loaded cluster.
Then I thought, hey, let's get rid of the network round trip and batch this
thing...

So I converted my code to use a mutator and addCounter instead of
insertCounter (on Hector). However, when I do, the results are always bad.
When I execute()
 - every 5000 lines, I get wonderful performance but I constantly get
Timeouts
 - every 500, same thing
 - every 10, the timeouts take longer to appear but they're still there
 - every 1, it works just like before batching
And this happens even with a single thread running

So my question is not about the absolute performance of my cluster but about
how I'm supposed to use batch updates : it doesn't look like the execute()
call blocks until it's performed the mutation and tpstats has showed up to
200.000 mutations pending.

Any ideas ?

batch mutates & throughput

Reply via email to