Keep in mind though that if you do
> 10ms batches, you're going to add an average of 5ms latency to your
> messages compared to the theoretical throughput of a SSD that didn't have
> that limitation, even if your current message rates are slow enough that
> the SSD can keep up without batching.
>
>
Oh.  That’s one of the cool things of the ’smart batching’ algorithm. The
latencies will fall on SSD automatically.

This is because we just grab ALL the messages (or a configured limit of
them) from the queue, then commit them all at once, then bulk ack them.

So in practice , on SSD, the latencies would be a function of the IOPS of
the drive.

So if the SSD can do 10k IOPS , the packet latency will be around 100us..

ALL the messages on the queue is probably a bit extreme.  You wouldn’t want
to read too much because that would increase latency.  So maybe a
maxSmartBatchSize would be good.

So if batching is implemented, I'd definitely want it to be a configurable
> option rather than the one and only way it works.


Agreed.  It would probably need to be implemented this way just to avoid
scaring people.  If it works well it could be enabled by default in a
future implementation.


> The easy option would
> simply be a configuration option to turn batching on or off (and always pay
> the 5ms latency cost); the better option would be an adaptive strategy that
> (if enabled) recognizes when you throughput crosses the worth-it threshold
> and begins batching, and then recognizes when you drop back below it and
> goes back to the sync-per-message approach.
>
>
Note that batching isn’t enabled if you’re writing at a slow rate.

If you emit one message per second, the queue length per commit will be 1
so effectively you’re batching with 1 item in the batch.  so the latency
isn’t there.


> Also, keep in mind that this strategy is only going to make a performance
> improvement if it's possible to process messages in parallel; if the code
> is written so you can't move to the next message in a destination till the
> previous one is sync'ed, then you'll get an improvement by eliminating
> contention between destinations but you'll go the same speed (slower,
> actually, due to the 5ms average latency of the batching) within a single
> destination.  So as usual, YMMV, which is another reason why this should be
> an option rather than the only option.
>

Agreed. I started looking at the code and it wasn’t immediately obvious.
I’d probably need 2-4 hours and I just don’t have it right now :-(



-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>

Reply via email to