> But read latency is still something like 30ms which I would think would be > much higher if it's saturated.
No. You're using stress, so you have some total cap on concurrency. Given a fixed concurrency, you'll saturate at some particular average latency which is mostly a function of the backlog implies by the concurrency and the average time for each actual request to process. If you double the concurrency of your stress clients, you should expect roughly twice the average latency. Running a fixed-concurrency benchmark against a saturated cluster will have vastly different effects on latency than trying to serve real live traffic without a feedback mechanism with a system which is processing fewer requests than incoming. -- / Peter Schuller