This looks fine, 8k read ahead as you mentioned. Doesnt look like an issue of data model as well since reads in this https://cl.ly/2c3Z1u2k0u2I appear balanced.
In most possibility, this looks like an issue with new node configuration to me. The fact that you have really less data going out of node rules out the possibility of More "hot" data than can be cached. Are your nodes running spark jobs in locality which are filtering data locally and sending limited data out? Im finding 800M Disk IO for 4M network transfer a really fishy! I believe as a starting point, you can try and debugging page faults with: *sar -B 1 10* Regards*,* On Sun, Feb 19, 2017 at 2:57 AM, Benjamin Roth <benjamin.r...@jaumo.com> wrote: > Just for the record, that's what dstat looks like while CS is starting: > > root@cas10:~# dstat -lrnv 10 > ---load-avg--- --io/total- -net/total- ---procs--- ------memory-usage----- > ---paging-- -dsk/total- ---system-- ----total-cpu-usage---- > 1m 5m 15m | read writ| recv send|run blk new| used buff cach > free| in out | read writ| int csw |usr sys idl wai hiq siq > 0.69 0.18 0.06| 228 24.3 | 0 0 |0.0 0 24|17.8G 3204k 458M > 108G| 0 0 |5257k 417k| 17k 3319 | 2 1 97 0 0 0 > 0.96 0.26 0.09| 591 27.9 | 522k 476k|4.1 0 69|18.3G 3204k 906M > 107G| 0 0 | 45M 287k| 22k 6943 | 7 1 92 0 0 0 > 13.2 2.83 0.92|2187 28.7 |1311k 839k|5.3 90 18|18.9G 3204k 9008M > 98.1G| 0 0 | 791M 8346k| 49k 25k| 17 1 36 46 0 0 > 30.6 6.91 2.27|2188 67.0 |4200k 3610k|8.8 106 27|19.5G 3204k 17.9G > 88.4G| 0 0 | 927M 8396k| 116k 119k| 24 2 17 57 0 0 > 43.6 10.5 3.49|2136 24.3 |4371k 3708k|6.3 108 1.0|19.5G 3204k 26.7G > 79.6G| 0 0 | 893M 13M| 117k 159k| 15 1 17 66 0 0 > 56.9 14.4 4.84|2152 32.5 |3937k 3767k| 11 83 5.0|19.5G 3204k 35.5G > 70.7G| 0 0 | 894M 14M| 126k 160k| 16 1 16 65 0 0 > 63.2 17.1 5.83|2135 44.1 |4601k 4185k|6.9 99 35|19.6G 3204k 44.3G > 61.9G| 0 0 | 879M 15M| 133k 168k| 19 2 19 60 0 0 > 64.6 18.9 6.54|2174 42.2 |4393k 3522k|8.4 93 2.2|20.0G 3204k 52.7G > 53.0G| 0 0 | 897M 14M| 138k 160k| 14 2 15 69 0 0 > > The IO shoots up (791M) as soon as CS has started up and accepts requests. > I also diffed sysctl of the both machines. No significant differences. > Only CPU-related, random values and some hashes differ. > > 2017-02-18 21:49 GMT+01:00 Benjamin Roth <benjamin.r...@jaumo.com>: > >> 256 tokens: >> >> root@cas9:/sys/block/dm-0# blockdev --report >> RO RA SSZ BSZ StartSec Size Device >> rw 256 512 4096 0 67108864 /dev/ram0 >> rw 256 512 4096 0 67108864 /dev/ram1 >> rw 256 512 4096 0 67108864 /dev/ram2 >> rw 256 512 4096 0 67108864 /dev/ram3 >> rw 256 512 4096 0 67108864 /dev/ram4 >> rw 256 512 4096 0 67108864 /dev/ram5 >> rw 256 512 4096 0 67108864 /dev/ram6 >> rw 256 512 4096 0 67108864 /dev/ram7 >> rw 256 512 4096 0 67108864 /dev/ram8 >> rw 256 512 4096 0 67108864 /dev/ram9 >> rw 256 512 4096 0 67108864 /dev/ram10 >> rw 256 512 4096 0 67108864 /dev/ram11 >> rw 256 512 4096 0 67108864 /dev/ram12 >> rw 256 512 4096 0 67108864 /dev/ram13 >> rw 256 512 4096 0 67108864 /dev/ram14 >> rw 256 512 4096 0 67108864 /dev/ram15 >> rw 16 512 4096 0 800166076416 <0800%20166076416> >> /dev/sda >> rw 16 512 4096 2048 800164151296 /dev/sda1 >> rw 16 512 4096 0 644245094400 <06442%2045094400> >> /dev/dm-0 >> rw 16 512 4096 0 2046820352 /dev/dm-1 >> rw 16 512 4096 0 1023410176 /dev/dm-2 >> rw 16 512 4096 0 800166076416 <0800%20166076416> >> /dev/sdb >> >> 512 tokens: >> root@cas10:/sys/block# blockdev --report >> RO RA SSZ BSZ StartSec Size Device >> rw 256 512 4096 0 67108864 /dev/ram0 >> rw 256 512 4096 0 67108864 /dev/ram1 >> rw 256 512 4096 0 67108864 /dev/ram2 >> rw 256 512 4096 0 67108864 /dev/ram3 >> rw 256 512 4096 0 67108864 /dev/ram4 >> rw 256 512 4096 0 67108864 /dev/ram5 >> rw 256 512 4096 0 67108864 /dev/ram6 >> rw 256 512 4096 0 67108864 /dev/ram7 >> rw 256 512 4096 0 67108864 /dev/ram8 >> rw 256 512 4096 0 67108864 /dev/ram9 >> rw 256 512 4096 0 67108864 /dev/ram10 >> rw 256 512 4096 0 67108864 /dev/ram11 >> rw 256 512 4096 0 67108864 /dev/ram12 >> rw 256 512 4096 0 67108864 /dev/ram13 >> rw 256 512 4096 0 67108864 /dev/ram14 >> rw 256 512 4096 0 67108864 /dev/ram15 >> rw 16 512 4096 0 800166076416 <0800%20166076416> >> /dev/sda >> rw 16 512 4096 2048 800164151296 /dev/sda1 >> rw 16 512 4096 0 800166076416 <0800%20166076416> >> /dev/sdb >> rw 16 512 4096 2048 800165027840 /dev/sdb1 >> rw 16 512 4096 0 1073741824000 /dev/dm-0 >> rw 16 512 4096 0 2046820352 /dev/dm-1 >> rw 16 512 4096 0 1023410176 /dev/dm-2 >> >> 2017-02-18 21:41 GMT+01:00 Bhuvan Rawal <bhu1ra...@gmail.com>: >> >>> Hi Ben, >>> >>> If its same on both machines then something else could be the issue. We >>> faced high disk io due to misconfigured read ahead which resulted in high >>> amount of disk io for comparatively insignificant network transfer. >>> >>> Can you post output of blockdev --report for a normal node and 512 token >>> node. >>> >>> Regards, >>> >>> On Sun, Feb 19, 2017 at 2:07 AM, Benjamin Roth <benjamin.r...@jaumo.com> >>> wrote: >>> >>>> cat /sys/block/sda/queue/read_ahead_kb >>>> => 8 >>>> >>>> On all CS nodes. Is that what you mean? >>>> >>>> 2017-02-18 21:32 GMT+01:00 Bhuvan Rawal <bhu1ra...@gmail.com>: >>>> >>>>> Hi Benjamin, >>>>> >>>>> What is the disk read ahead on both nodes? >>>>> >>>>> Regards, >>>>> Bhuvan >>>>> >>>>> On Sun, Feb 19, 2017 at 1:58 AM, Benjamin Roth < >>>>> benjamin.r...@jaumo.com> wrote: >>>>> >>>>>> This is status of the largest KS of these both nodes: >>>>>> UN 10.23.71.10 437.91 GiB 512 49.1% >>>>>> 2679c3fa-347e-4845-bfc1-c4d0bc906576 RAC1 >>>>>> UN 10.23.71.9 246.99 GiB 256 28.3% >>>>>> 2804ef8a-26c8-4d21-9e12-01e8b6644c2f RAC1 >>>>>> >>>>>> So roughly as expected. >>>>>> >>>>>> 2017-02-17 23:07 GMT+01:00 kurt greaves <k...@instaclustr.com>: >>>>>> >>>>>>> what's the Owns % for the relevant keyspace from nodetool status? >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Benjamin Roth >>>>>> Prokurist >>>>>> >>>>>> Jaumo GmbH · www.jaumo.com >>>>>> Wehrstraße 46 · 73035 Göppingen · Germany >>>>>> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1 >>>>>> <07161%203048801> >>>>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Benjamin Roth >>>> Prokurist >>>> >>>> Jaumo GmbH · www.jaumo.com >>>> Wehrstraße 46 · 73035 Göppingen · Germany >>>> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1 >>>> <07161%203048801> >>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer >>>> >>> >>> >> >> >> -- >> Benjamin Roth >> Prokurist >> >> Jaumo GmbH · www.jaumo.com >> Wehrstraße 46 · 73035 Göppingen · Germany >> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1 >> <07161%203048801> >> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer >> > > > > -- > Benjamin Roth > Prokurist > > Jaumo GmbH · www.jaumo.com > Wehrstraße 46 · 73035 Göppingen · Germany > Phone +49 7161 304880-6 · Fax +49 7161 304880-1 > AG Ulm · HRB 731058 · Managing Director: Jens Kammerer >