Consistent read timeouts for bursts of reads

Emīls Šolmanis Thu, 25 Feb 2016 06:55:51 -0800

Hello,

We're having a problem with concurrent requests. It seems that whenever we
try resolving more
than ~ 15 queries at the same time, one or two get a read timeout and then
succeed on a retry.


We're running Cassandra 2.2.4 accessed via the 2.1.9 Datastax driver on AWS.

What we've found while investigating:

 * this is not db-wide. Trying the same pattern against another table
everything works fine.
 * it fails 1 or 2 requests regardless of how many are executed in
parallel, i.e., it's still 1 or 2 when we ramp it up to ~ 120 concurrent
requests and doesn't seem to scale up.
 * the problem is consistently reproducible. It happens both under heavier
load and when just firing off a single batch of requests for testing.
 * tracing the faulty requests says everything is great. An example trace:
https://gist.github.com/emilssolmanis/41e1e2ecdfd9a0569b1a
 * the only peculiar thing in the logs is there's no acknowledgement of the
request being accepted by the server, as seen in
https://gist.github.com/emilssolmanis/242d9d02a6d8fb91da8a
 * there's nothing funny in the timed out Cassandra node's logs around that
time as far as I can tell, not even in the debug logs.

Any ideas about what might be causing this, pointers to server config
options, or how else we might debug this would be much appreciated.

Kind regards,
Emils

Consistent read timeouts for bursts of reads

Reply via email to