You are right, Kurt, it's what I was trying to do - lowering compression chunk 
size and device read-ahead.

Column-family settings: "compression = {'chunk_length_kb': '16', 
'sstable_compression': 'org.apache.cassandra.io.compress.SnappyCompressor'}"
Device read-ahead: blockdev --setra 8 ....

I had to fallback to default RA 256 and got large merged reads and small iops 
with good MBytes/sec after this.
I believe it's not caused by C* settings, but it's something with filesystem / 
IO-related kernel settings (or it's by design?).


Tried to emulate C* reads during compactions by dd:


******  RA=8 (4k)

# blockdev --setra 8 /dev/xvdb
# dd if=/dev/zero of=/data/ZZZ
^C16980952+0 records in
16980951+0 records out
8694246912 bytes (8.7 GB, 8.1 GiB) copied, 36.4651 s, 238 MB/s
# sync

# echo 3 > /proc/sys/vm/drop_caches
# dd if=/data/ZZZ of=/dev/null
^C846513+0 records in
846512+0 records out
433414144 bytes (433 MB, 413 MiB) copied, 21.4604 s, 20.2 MB/s   <<<<<

High IOPS in this case, io size = 4k.
What's interesting, setting bs=128k in dd didn't decrease iops, io size still 
was 4k


****** RA=256 (128k):
# blockdev --setra 256 /dev/xvdb
# echo 3 > /proc/sys/vm/drop_caches
# dd if=/data/ZZZ of=/dev/null
^C15123937+0 records in
15123936+0 records out
7743455232 bytes (7.7 GB, 7.2 GiB) copied, 60.8407 s, 127 MB/s  <<<<<<

io size - 128k, small iops, good throughput (limited by EBS bandwidth)

Writes were fine in both cases: io size 128k, good throughput limited by EBS 
bandwidth only

Is above situation typical for small read-ahead ("price for small fast reads") 
or it's something wrong with my setup?
[It's not XFS mailing list, but as somebody here may know this, ] Why in case 
of small RA even large reads (bs=128k) are converted to multiple small reads?

Regards,
Kyrill


________________________________
From: kurt greaves <k...@instaclustr.com>
Sent: Tuesday, May 8, 2018 2:12:40 AM
To: User
Subject: Re: compaction: huge number of random reads

If you've got small partitions/small reads you should test lowering your 
compression chunk size on the table and disabling read ahead. This sounds like 
it might just be a case of read amplification.

On Tue., 8 May 2018, 05:43 Kyrylo Lebediev, 
<kyrylo_lebed...@epam.com<mailto:kyrylo_lebed...@epam.com>> wrote:

Dear Experts,


I'm observing strange behavior on a cluster 2.1.20 during compactions.


My setup is:

12 nodes  m4.2xlarge (8 vCPU, 32G RAM) Ubuntu 16.04, 2T EBS gp2.

Filesystem: XFS, blocksize 4k, device read-ahead - 4k

/sys/block/vxdb/queue/nomerges = 0

SizeTieredCompactionStrategy


After data loads when effectively nothing else is talking to the cluster and 
compactions is the only activity, I see something like this:
$ iostat -dkx 1
...


Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
xvda              0.00     0.00    0.00    0.00     0.00     0.00     0.00     
0.00    0.00    0.00    0.00   0.00   0.00
xvdb              0.00     0.00 4769.00  213.00 19076.00 26820.00    18.42     
7.95    1.17    1.06    3.76   0.20 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
xvda              0.00     0.00    0.00    0.00     0.00     0.00     0.00     
0.00    0.00    0.00    0.00   0.00   0.00
xvdb              0.00     0.00 6098.00  177.00 24392.00 22076.00    14.81     
6.46    1.36    0.96   15.16   0.16 100.00

Writes are fine: 177 writes/sec <-> ~22Mbytes/sec,

But for some reason compactions generate a huge number of small reads:
6098 reads/s <-> ~24Mbytes/sec.  ===>   Read size is 4k


Why instead much smaller amount of large reads I'm getting huge number of 4k 
reads instead?

What could be the reason?


Thanks,

Kyrill


Reply via email to