You will need to experiment with chunk_length based on your dataset. At the end of the day its about finding the sweetspot as chunk_length needs to be big enough such that you can get a decent compression rate (large chunks increases the likelihood of a better compression ratio, which means you will read less from disk) but you also want it to be small so that you are not reading unrelated data from disk.
But... before you go down the chunk_length testing rabbit hole. Make sure you are using a sane read_ahead value on the block device your data directory sits on. For example if you are on AWS and using a raid device built with mdadm the read_ahead value for the block device can be as high as 128kb by default. If you are on SSDs you can safely drop it to 8 or 16 (or even 0) and see a big uptick in read performance. For lots of juicy low level disk tuning and further details see Al Tobey's guide https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html On Fri, 29 Jan 2016 at 08:26 Jean Carlo <jean.jeancar...@gmail.com> wrote: > Hi guys > > I want to set the param chunk_length_kb in order to improve the read > latency of my cassandra_stress's test. > > This is the table > > CREATE TABLE "Keyspace1".standard1 ( > key blob PRIMARY KEY, > "C0" blob, > "C1" blob, > "C2" blob, > "C3" blob, > "C4" blob > ) WITH bloom_filter_fp_chance = 0.1 > AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' > AND comment = '' > AND compaction = {'sstable_size_in_mb': '160', 'class': > 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'} > AND compression = {'sstable_compression': > 'org.apache.cassandra.io.compress.SnappyCompressor'} > AND dclocal_read_repair_chance = 0.1 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.0 > AND speculative_retry = '99.0PERCENTILE'; > > I have 6 columns of type blob. This table is filled by cassandra_stres > > admin@cqlsh:Keyspace1> select * from standard1 limit 2; > > key | > C0 | > C1 | > C2 | > C3 | C4 > > ------------------------+------------------------------------------------------------------------+------------------------------------------------------------------------+------------------------------------------------------------------------+------------------------------------------------------------------------+------------------------------------------------------------------------ > 0x4b343050393536353531 | > 0xe0e3d68ed1536e4d994aa74860270ac91cf7941acb5eefd925815481298f0d558d4f | > 0xa43f78202576f1ccbdf50657792fac06f0ca7c9416ee68a08125c8dce4dfd085131d | > 0xab12b06bf64c73e708d1b96fea9badc678303906e3d5f5f96fae7d8092ee0df0c54c | > 0x428a157cb598487a1b938bdb6c45b09fad3b6408fddc290a6b332b91426b00ddaeb2 | > 0x0583038d881ab25be72155bc3aa5cb9ec3aab8e795601abe63a2b35f48ce1e359f5e > > I am having a read latency of ~500 microseconds, I think it takes to much > time comparing to the write latency of ~30 microseconds. > > My first clue is to fix the chunk_length_kb to a value close to the size > of the rows in kb > > Am I in the right direction? If it is true, how can I compute the size of > a row? > > Other question, the value of "Compacted partition" of the command nodetool > cfstats migth give me a value close to the chunk_length_kb ? > > Best regards > > Jean Carlo > > "The best way to predict the future is to invent it" Alan Kay > -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer