Hi Kurt Thanks for the response. Few comments in line:
On Wed, Jun 28, 2017 at 1:17 PM, kurt greaves <k...@instaclustr.com> wrote: > You're correct in that the timeout is only driver side. The server will > have its own timeouts configured in the cassandra.yaml file. > Yup, OK. I suspect either that you have a node down in your cluster (or 4), > Nope, that’s not what is happening as a) we have monitoring on all nodes, b) there is nothing in the logs. > or your queries are gradually getting slower. > Perhaps, but we have query time metrics that don’t seem to indicate any obvious issues. See the attached metrics from the last 12 hours for quorum queries. > This kind of aligns with the slow query statements in your logs. Are you > making changes/updates to the partitions that you are querying? > No > It could be that the partitions are now spread across multiple SSTables > and thus slowing things down. You should perform a trace to get a better > idea of the issue. > If I run a CONSISTENCY QUORUM | ALL range query, it is visually very slow using cqlsh and unfortunately results in a trace failure: “Statement trace did not complete within 10 seconds”. A hacky workaround would be to increase your read timeouts server side > (read_timeout_in_ms), however this will mask underlying data model issues. > Yup, I certainly don’t like the idea of that. I’m interested in what you said about the partitions being spread across multiple SSTables. Any pointers on what to look for there? I then wondered if perhaps a range query is really just not a good idea, even if only for monitoring purposes. I tried querying for just one row with the ID specified i.e. something like SELECT * from keyspace.table where id = 123; It was still incredibly slow (with CONSISTENCY ALL) and failed a few times to generate a trace, but finally resulted in a trace that can be seen at https://gist.github.com/mattheworiordan/b1133008bf6fd14bfe6937a0004c8789#file-cassandra-trace-log . The worse offender seemed to be 34.207.246.175, so I ran the same query on that instance itself to see if it is under load / servicing requests slowly and it’s not. See https://gist.github.com/mattheworiordan/b1133008bf6fd14bfe6937a0004c8789#file-local-cassandra-trace-log . So as far as I can tell, it looks like there may be some issue with nodes communicating with each other perhaps, but the logs don’t reveal much. Where to now? -- Regards, Matthew O'Riordan CEO who codes Ably - simply better realtime <https://www.ably.io/> *Ably News: Ably push notifications have gone live <https://blog.ably.io/ably-push-notifications-are-now-available-64cb8ae37e74>*
--------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org