Could you send cfhistograms and cfstats relevant to the read column family?
That could help Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso> On 17 October 2015 at 16:15, Brice Figureau < brice+cassan...@daysofwonder.com> wrote: > Hi, > > I've read all I could find on how cassandra works, I'm still wondering why > the following query takes more than 5s to return on a simple (and modest) 3 > nodes cassandra 2.1.9 cluster: > > SELECT sequence_nr, used > FROM messages > WHERE persistence_id = 'session-SW' AND partition_nr = 0; > > The schema of this table: > > CREATE TABLE akka.messages ( > persistence_id text, > partition_nr bigint, > sequence_nr bigint, > message blob, > used boolean static, > PRIMARY KEY ((persistence_id, partition_nr), sequence_nr) > ) WITH CLUSTERING ORDER BY (sequence_nr ASC) > AND bloom_filter_fp_chance = 0.01 > AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' > AND comment = '' > AND compaction = {'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'} > AND compression = {'sstable_compression': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > AND dclocal_read_repair_chance = 0.1 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.0 > AND speculative_retry = '99.0PERCENTILE'; > > > This is a query from Akka cassandra journal ( > https://github.com/krasserm/akka-persistence-cassandra) which returns 33 > rows for the moment. > > The time doesn't vary if I use a QUORUM consistency or ONE. > Note that the same query but with a different `persistence_id` are much > faster (it might depend on the number of rows returned). > > Here's the (redacted) trace taken with cqlsh: > > activity > | timestamp | source | > source_elapsed > > ---------------------------------------------------------------------------------------------------+----------------------------+----------------+---------------- > > Execute CQL3 query | 2015-10-17 17:01:05.681000 | 192.168.168.26 | > 0 > Parsing > .................................................................. > [SharedPool-Worker-1] | 2015-10-17 17:01:05.683000 | 192.168.168.26 | > 84 > READ message received from /192.168.168.26 > [MessagingService-Incoming-/192.168.168.26] | 2015-10-17 17:01:05.683000 > | 192.168.168.29 | 69 > Preparing > statement [SharedPool-Worker-1] | 2015-10-17 17:01:05.683000 | > 192.168.168.26 | 215 > reading data from / > 192.168.168.29 [SharedPool-Worker-1] | 2015-10-17 17:01:05.683000 | > 192.168.168.26 | 1189 > Sending READ message to /192.168.168.29 > [MessagingService-Outgoing-/192.168.168.29] | 2015-10-17 17:01:05.683000 > | 192.168.168.26 | 1317 > Executing single-partition query on > messages [SharedPool-Worker-1] | 2015-10-17 17:01:05.684000 | > 192.168.168.29 | 175 > Acquiring sstable > references [SharedPool-Worker-1] | 2015-10-17 17:01:05.684000 | > 192.168.168.29 | 189 > Merging memtable > tombstones [SharedPool-Worker-1] | 2015-10-17 17:01:05.685000 | > 192.168.168.29 | 204 > Key cache hit for sstable > 434 [SharedPool-Worker-1] | 2015-10-17 17:01:05.685000 | 192.168.168.29 | > 257 > Seeking to partition beginning in data > file [SharedPool-Worker-1] | 2015-10-17 17:01:05.685000 | 192.168.168.29 | > 269 > Key cache hit for sstable > 432 [SharedPool-Worker-1] | 2015-10-17 17:01:05.685000 | 192.168.168.29 | > 514 > Seeking to partition beginning in data > file [SharedPool-Worker-1] | 2015-10-17 17:01:05.686000 | 192.168.168.29 | > 527 > Key cache hit for sstable > 433 [SharedPool-Worker-1] | 2015-10-17 17:01:05.686000 | 192.168.168.29 | > 779 > Seeking to partition beginning in data > file [SharedPool-Worker-1] | 2015-10-17 17:01:05.686000 | 192.168.168.29 | > 789 > Skipped 0/3 non-slice-intersecting sstables, included 0 due to > tombstones [SharedPool-Worker-1] | 2015-10-17 17:01:05.686000 | > 192.168.168.29 | 929 > Merging data from memtables and 3 > sstables [SharedPool-Worker-1] | 2015-10-17 17:01:05.687000 | > 192.168.168.29 | 956 > speculating read retry on / > 192.168.168.26 [SharedPool-Worker-1] | 2015-10-17 17:01:05.724000 | > 192.168.168.26 | 42423 > Sending READ message to /192.168.168.26 > [MessagingService-Outgoing-/192.168.168.26] | 2015-10-17 17:01:05.724000 > | 192.168.168.26 | 42573 > READ message received from /192.168.168.26 > [MessagingService-Incoming-/192.168.168.26] | 2015-10-17 17:01:05.724000 > | 192.168.168.26 | 42763 > Executing single-partition query on > messages [SharedPool-Worker-2] | 2015-10-17 17:01:05.724000 | > 192.168.168.26 | 42855 > Acquiring sstable > references [SharedPool-Worker-2] | 2015-10-17 17:01:05.725000 | > 192.168.168.26 | 42868 > Merging memtable > tombstones [SharedPool-Worker-2] | 2015-10-17 17:01:05.725000 | > 192.168.168.26 | 42934 > Bloom filter allows skipping sstable > 443 [SharedPool-Worker-2] | 2015-10-17 17:01:05.725000 | 192.168.168.26 | > 42965 > Key cache hit for sstable > 442 [SharedPool-Worker-2] | 2015-10-17 17:01:05.725000 | 192.168.168.26 | > 42986 > Seeking to partition beginning in data > file [SharedPool-Worker-2] | 2015-10-17 17:01:05.725000 | 192.168.168.26 | > 42991 > Skipped 0/2 non-slice-intersecting sstables, included 0 due to > tombstones [SharedPool-Worker-2] | 2015-10-17 17:01:05.726000 | > 192.168.168.26 | 43115 > Merging data from memtables and 1 > sstables [SharedPool-Worker-2] | 2015-10-17 17:01:05.726000 | > 192.168.168.26 | 43137 > Read 33 live and 0 tombstone > cells [SharedPool-Worker-2] | 2015-10-17 17:01:11.239000 | 192.168.168.26 > | 557248 > Enqueuing response to / > 192.168.168.26 [SharedPool-Worker-2] | 2015-10-17 17:01:11.244000 | > 192.168.168.26 | 562973 > Sending REQUEST_RESPONSE message to /192.168.168.26 > [MessagingService-Outgoing-/192.168.168.26] | 2015-10-17 17:01:11.245000 > | 192.168.168.26 | 563209 > REQUEST_RESPONSE message received from /192.168.168.26 > [MessagingService-Incoming-/192.168.168.26] | 2015-10-17 17:01:11.302000 > | 192.168.168.26 | 620575 > Processing response from / > 192.168.168.26 [SharedPool-Worker-2] | 2015-10-17 17:01:11.302000 | > 192.168.168.26 | 620792 > Read 33 live and 0 tombstone > cells [SharedPool-Worker-1] | 2015-10-17 17:01:11.330000 | 192.168.168.29 > | 646481 > Enqueuing response to / > 192.168.168.26 [SharedPool-Worker-1] | 2015-10-17 17:01:11.335000 | > 192.168.168.29 | 651628 > Sending REQUEST_RESPONSE message to /192.168.168.26 > [MessagingService-Outgoing-/192.168.168.26] | 2015-10-17 17:01:11.335000 > | 192.168.168.29 | 651769 > > Request complete | 2015-10-17 17:01:11.312485 | 192.168.168.26 | > 631485 > > > I'm not sure I read correctly the trace, but it seems the time is taken on > the: > "Read 33 live and 0 tombstone cells" > > What does this step do? > How can I troubleshoot it? > > The cluster configuration is the default one, with the following changes: > concurrent_reads and writes at 16 (because it's a very modest hardware, but > I used 32 without any differences), increased timeouts, and using hsha. > > Note that I had to increase the timeouts for the request to succeed and be > able to trace it (otherwise it was always timing out). > > Note that the same request on the same data loaded from a snapshot on a > 1-node cassandra cluster running in a VM on my laptop doesn't exhibit the > same issue (it's instantaneous). > > Also, a non-related question: in what unit is the column "source_elapsed"? > And is the "timestamp" taken at the start of the operation or at the end? > > Thanks for any answers, > -- > Brice Figureau >