Hi, I've read all I could find on how cassandra works, I'm still wondering why the following query takes more than 5s to return on a simple (and modest) 3 nodes cassandra 2.1.9 cluster:
SELECT sequence_nr, used FROM messages WHERE persistence_id = 'session-SW' AND partition_nr = 0; The schema of this table: CREATE TABLE akka.messages ( persistence_id text, partition_nr bigint, sequence_nr bigint, message blob, used boolean static, PRIMARY KEY ((persistence_id, partition_nr), sequence_nr) ) WITH CLUSTERING ORDER BY (sequence_nr ASC) AND bloom_filter_fp_chance = 0.01 AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99.0PERCENTILE'; This is a query from Akka cassandra journal (https://github.com/krasserm/akka-persistence-cassandra) which returns 33 rows for the moment. The time doesn't vary if I use a QUORUM consistency or ONE. Note that the same query but with a different `persistence_id` are much faster (it might depend on the number of rows returned). Here's the (redacted) trace taken with cqlsh: activity | timestamp | source | source_elapsed ---------------------------------------------------------------------------------------------------+----------------------------+----------------+---------------- Execute CQL3 query | 2015-10-17 17:01:05.681000 | 192.168.168.26 | 0 Parsing .................................................................. [SharedPool-Worker-1] | 2015-10-17 17:01:05.683000 | 192.168.168.26 | 84 READ message received from /192.168.168.26 [MessagingService-Incoming-/192.168.168.26] | 2015-10-17 17:01:05.683000 | 192.168.168.29 | 69 Preparing statement [SharedPool-Worker-1] | 2015-10-17 17:01:05.683000 | 192.168.168.26 | 215 reading data from /192.168.168.29 [SharedPool-Worker-1] | 2015-10-17 17:01:05.683000 | 192.168.168.26 | 1189 Sending READ message to /192.168.168.29 [MessagingService-Outgoing-/192.168.168.29] | 2015-10-17 17:01:05.683000 | 192.168.168.26 | 1317 Executing single-partition query on messages [SharedPool-Worker-1] | 2015-10-17 17:01:05.684000 | 192.168.168.29 | 175 Acquiring sstable references [SharedPool-Worker-1] | 2015-10-17 17:01:05.684000 | 192.168.168.29 | 189 Merging memtable tombstones [SharedPool-Worker-1] | 2015-10-17 17:01:05.685000 | 192.168.168.29 | 204 Key cache hit for sstable 434 [SharedPool-Worker-1] | 2015-10-17 17:01:05.685000 | 192.168.168.29 | 257 Seeking to partition beginning in data file [SharedPool-Worker-1] | 2015-10-17 17:01:05.685000 | 192.168.168.29 | 269 Key cache hit for sstable 432 [SharedPool-Worker-1] | 2015-10-17 17:01:05.685000 | 192.168.168.29 | 514 Seeking to partition beginning in data file [SharedPool-Worker-1] | 2015-10-17 17:01:05.686000 | 192.168.168.29 | 527 Key cache hit for sstable 433 [SharedPool-Worker-1] | 2015-10-17 17:01:05.686000 | 192.168.168.29 | 779 Seeking to partition beginning in data file [SharedPool-Worker-1] | 2015-10-17 17:01:05.686000 | 192.168.168.29 | 789 Skipped 0/3 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-1] | 2015-10-17 17:01:05.686000 | 192.168.168.29 | 929 Merging data from memtables and 3 sstables [SharedPool-Worker-1] | 2015-10-17 17:01:05.687000 | 192.168.168.29 | 956 speculating read retry on /192.168.168.26 [SharedPool-Worker-1] | 2015-10-17 17:01:05.724000 | 192.168.168.26 | 42423 Sending READ message to /192.168.168.26 [MessagingService-Outgoing-/192.168.168.26] | 2015-10-17 17:01:05.724000 | 192.168.168.26 | 42573 READ message received from /192.168.168.26 [MessagingService-Incoming-/192.168.168.26] | 2015-10-17 17:01:05.724000 | 192.168.168.26 | 42763 Executing single-partition query on messages [SharedPool-Worker-2] | 2015-10-17 17:01:05.724000 | 192.168.168.26 | 42855 Acquiring sstable references [SharedPool-Worker-2] | 2015-10-17 17:01:05.725000 | 192.168.168.26 | 42868 Merging memtable tombstones [SharedPool-Worker-2] | 2015-10-17 17:01:05.725000 | 192.168.168.26 | 42934 Bloom filter allows skipping sstable 443 [SharedPool-Worker-2] | 2015-10-17 17:01:05.725000 | 192.168.168.26 | 42965 Key cache hit for sstable 442 [SharedPool-Worker-2] | 2015-10-17 17:01:05.725000 | 192.168.168.26 | 42986 Seeking to partition beginning in data file [SharedPool-Worker-2] | 2015-10-17 17:01:05.725000 | 192.168.168.26 | 42991 Skipped 0/2 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-10-17 17:01:05.726000 | 192.168.168.26 | 43115 Merging data from memtables and 1 sstables [SharedPool-Worker-2] | 2015-10-17 17:01:05.726000 | 192.168.168.26 | 43137 Read 33 live and 0 tombstone cells [SharedPool-Worker-2] | 2015-10-17 17:01:11.239000 | 192.168.168.26 | 557248 Enqueuing response to /192.168.168.26 [SharedPool-Worker-2] | 2015-10-17 17:01:11.244000 | 192.168.168.26 | 562973 Sending REQUEST_RESPONSE message to /192.168.168.26 [MessagingService-Outgoing-/192.168.168.26] | 2015-10-17 17:01:11.245000 | 192.168.168.26 | 563209 REQUEST_RESPONSE message received from /192.168.168.26 [MessagingService-Incoming-/192.168.168.26] | 2015-10-17 17:01:11.302000 | 192.168.168.26 | 620575 Processing response from /192.168.168.26 [SharedPool-Worker-2] | 2015-10-17 17:01:11.302000 | 192.168.168.26 | 620792 Read 33 live and 0 tombstone cells [SharedPool-Worker-1] | 2015-10-17 17:01:11.330000 | 192.168.168.29 | 646481 Enqueuing response to /192.168.168.26 [SharedPool-Worker-1] | 2015-10-17 17:01:11.335000 | 192.168.168.29 | 651628 Sending REQUEST_RESPONSE message to /192.168.168.26 [MessagingService-Outgoing-/192.168.168.26] | 2015-10-17 17:01:11.335000 | 192.168.168.29 | 651769 Request complete | 2015-10-17 17:01:11.312485 | 192.168.168.26 | 631485 I'm not sure I read correctly the trace, but it seems the time is taken on the: "Read 33 live and 0 tombstone cells" What does this step do? How can I troubleshoot it? The cluster configuration is the default one, with the following changes: concurrent_reads and writes at 16 (because it's a very modest hardware, but I used 32 without any differences), increased timeouts, and using hsha. Note that I had to increase the timeouts for the request to succeed and be able to trace it (otherwise it was always timing out). Note that the same request on the same data loaded from a snapshot on a 1-node cassandra cluster running in a VM on my laptop doesn't exhibit the same issue (it's instantaneous). Also, a non-related question: in what unit is the column "source_elapsed"? And is the "timestamp" taken at the start of the operation or at the end? Thanks for any answers, -- Brice Figureau