Read query taking a long time

Brice Figureau Sat, 17 Oct 2015 08:16:25 -0700

Hi,

I've read all I could find on how cassandra works, I'm still wondering why the 
following query takes more than 5s to return on a simple (and modest) 3 nodes 
cassandra 2.1.9 cluster:


SELECT sequence_nr, used 
FROM messages 
WHERE persistence_id = 'session-SW' AND partition_nr = 0;

The schema of this table:

CREATE TABLE akka.messages (
    persistence_id text,
    partition_nr bigint,
    sequence_nr bigint,
    message blob,
    used boolean static,
    PRIMARY KEY ((persistence_id, partition_nr), sequence_nr)
) WITH CLUSTERING ORDER BY (sequence_nr ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
    AND compression = {'sstable_compression': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';


This is a query from Akka cassandra journal 
(https://github.com/krasserm/akka-persistence-cassandra) which returns 33 rows 
for the moment.

The time doesn't vary if I use a QUORUM consistency or ONE.
Note that the same query but with a different `persistence_id` are much faster 
(it might depend on the number of rows returned).

Here's the (redacted) trace taken with cqlsh:

activity                                                                        
                  | timestamp                  | source         | source_elapsed
---------------------------------------------------------------------------------------------------+----------------------------+----------------+----------------
                                                                                
Execute CQL3 query | 2015-10-17 17:01:05.681000 | 192.168.168.26 |              0
  Parsing .................................................................. 
[SharedPool-Worker-1] | 2015-10-17 17:01:05.683000 | 192.168.168.26 |           
  84
            READ message received from /192.168.168.26 
[MessagingService-Incoming-/192.168.168.26] | 2015-10-17 17:01:05.683000 | 
192.168.168.29 |             69
                                                         Preparing statement 
[SharedPool-Worker-1] | 2015-10-17 17:01:05.683000 | 192.168.168.26 |           
 215
                                           reading data from /192.168.168.29 
[SharedPool-Worker-1] | 2015-10-17 17:01:05.683000 | 192.168.168.26 |           
1189
               Sending READ message to /192.168.168.29 
[MessagingService-Outgoing-/192.168.168.29] | 2015-10-17 17:01:05.683000 | 
192.168.168.26 |           1317
                                Executing single-partition query on messages 
[SharedPool-Worker-1] | 2015-10-17 17:01:05.684000 | 192.168.168.29 |           
 175
                                                Acquiring sstable references 
[SharedPool-Worker-1] | 2015-10-17 17:01:05.684000 | 192.168.168.29 |           
 189
                                                 Merging memtable tombstones 
[SharedPool-Worker-1] | 2015-10-17 17:01:05.685000 | 192.168.168.29 |           
 204
                                               Key cache hit for sstable 434 
[SharedPool-Worker-1] | 2015-10-17 17:01:05.685000 | 192.168.168.29 |           
 257
                                 Seeking to partition beginning in data file 
[SharedPool-Worker-1] | 2015-10-17 17:01:05.685000 | 192.168.168.29 |           
 269
                                               Key cache hit for sstable 432 
[SharedPool-Worker-1] | 2015-10-17 17:01:05.685000 | 192.168.168.29 |           
 514
                                 Seeking to partition beginning in data file 
[SharedPool-Worker-1] | 2015-10-17 17:01:05.686000 | 192.168.168.29 |           
 527
                                               Key cache hit for sstable 433 
[SharedPool-Worker-1] | 2015-10-17 17:01:05.686000 | 192.168.168.29 |           
 779
                                 Seeking to partition beginning in data file 
[SharedPool-Worker-1] | 2015-10-17 17:01:05.686000 | 192.168.168.29 |           
 789
   Skipped 0/3 non-slice-intersecting sstables, included 0 due to tombstones 
[SharedPool-Worker-1] | 2015-10-17 17:01:05.686000 | 192.168.168.29 |           
 929
                                  Merging data from memtables and 3 sstables 
[SharedPool-Worker-1] | 2015-10-17 17:01:05.687000 | 192.168.168.29 |           
 956
                                   speculating read retry on /192.168.168.26 
[SharedPool-Worker-1] | 2015-10-17 17:01:05.724000 | 192.168.168.26 |          
42423
               Sending READ message to /192.168.168.26 
[MessagingService-Outgoing-/192.168.168.26] | 2015-10-17 17:01:05.724000 | 
192.168.168.26 |          42573
            READ message received from /192.168.168.26 
[MessagingService-Incoming-/192.168.168.26] | 2015-10-17 17:01:05.724000 | 
192.168.168.26 |          42763
                                Executing single-partition query on messages 
[SharedPool-Worker-2] | 2015-10-17 17:01:05.724000 | 192.168.168.26 |          
42855
                                                Acquiring sstable references 
[SharedPool-Worker-2] | 2015-10-17 17:01:05.725000 | 192.168.168.26 |          
42868
                                                 Merging memtable tombstones 
[SharedPool-Worker-2] | 2015-10-17 17:01:05.725000 | 192.168.168.26 |          
42934
                                    Bloom filter allows skipping sstable 443 
[SharedPool-Worker-2] | 2015-10-17 17:01:05.725000 | 192.168.168.26 |          
42965
                                               Key cache hit for sstable 442 
[SharedPool-Worker-2] | 2015-10-17 17:01:05.725000 | 192.168.168.26 |          
42986
                                 Seeking to partition beginning in data file 
[SharedPool-Worker-2] | 2015-10-17 17:01:05.725000 | 192.168.168.26 |          
42991
   Skipped 0/2 non-slice-intersecting sstables, included 0 due to tombstones 
[SharedPool-Worker-2] | 2015-10-17 17:01:05.726000 | 192.168.168.26 |          
43115
                                  Merging data from memtables and 1 sstables 
[SharedPool-Worker-2] | 2015-10-17 17:01:05.726000 | 192.168.168.26 |          
43137
                                          Read 33 live and 0 tombstone cells 
[SharedPool-Worker-2] | 2015-10-17 17:01:11.239000 | 192.168.168.26 |         
557248
                                       Enqueuing response to /192.168.168.26 
[SharedPool-Worker-2] | 2015-10-17 17:01:11.244000 | 192.168.168.26 |         
562973
   Sending REQUEST_RESPONSE message to /192.168.168.26 
[MessagingService-Outgoing-/192.168.168.26] | 2015-10-17 17:01:11.245000 | 
192.168.168.26 |         563209
REQUEST_RESPONSE message received from /192.168.168.26 
[MessagingService-Incoming-/192.168.168.26] | 2015-10-17 17:01:11.302000 | 
192.168.168.26 |         620575
                                    Processing response from /192.168.168.26 
[SharedPool-Worker-2] | 2015-10-17 17:01:11.302000 | 192.168.168.26 |         
620792
                                          Read 33 live and 0 tombstone cells 
[SharedPool-Worker-1] | 2015-10-17 17:01:11.330000 | 192.168.168.29 |         
646481
                                       Enqueuing response to /192.168.168.26 
[SharedPool-Worker-1] | 2015-10-17 17:01:11.335000 | 192.168.168.29 |         
651628
   Sending REQUEST_RESPONSE message to /192.168.168.26 
[MessagingService-Outgoing-/192.168.168.26] | 2015-10-17 17:01:11.335000 | 
192.168.168.29 |         651769
                                                                                
  Request complete | 2015-10-17 17:01:11.312485 | 192.168.168.26 |         
631485


I'm not sure I read correctly the trace, but it seems the time is taken on the:
"Read 33 live and 0 tombstone cells"

What does this step do?
How can I troubleshoot it?

The cluster configuration is the default one, with the following changes: 
concurrent_reads and writes at 16 (because it's a very modest hardware, but I 
used 32 without any differences), increased timeouts, and using hsha.

Note that I had to increase the timeouts for the request to succeed and be able 
to trace it (otherwise it was always timing out).

Note that the same request on the same data loaded from a snapshot on a 1-node 
cassandra cluster running in a VM on my laptop doesn't exhibit the same issue 
(it's instantaneous).

Also, a non-related question: in what unit is the column "source_elapsed"?
And is the "timestamp" taken at the start of the operation or at the end?

Thanks for any answers,
-- 
Brice Figureau

Read query taking a long time

Reply via email to