Hello Bowen,
Thanks for your response.Yes, we are aware of the theory that RAID0 vs
individual JBOD, but all of our clusters are using this RAID0 configuration
through Azure, while only on this cluster we see this issue so it's hardly to
conclude root cause to the disk. This is more like workload related, and we are
seeking feedback here for any other parameters in the yaml that we could tune
for this.
Thanks again,Jiayong Sun
On Thursday, August 12, 2021, 04:55:51 AM PDT, Bowen Song <[email protected]>
wrote:
Hello Jiayong,
Using multiple disks in a RAID0 for Cassandra data directory is not
recommended. You will get better fault tolerance and often better performance
too with multiple data directories, one on each disk.
If you stick with RAID0, it's not 4 disks, it's 1 from Cassandra's point of
view, because any read or write operation will have to touch all 4 member
disks. Therefore, 4 flush writers doesn't make much sense.
On the frequent SSTable flush issue, a quick internet search leads me to:
* an old bug in Cassandra 2.1 - CASSANDRA-8409 which shouldn't affect 3.x at all
* a StackOverflow question may be related
Did you run repair? Do you use materialized views?
Regards,
Bowen
On 11/08/2021 15:58, Jiayong Sun wrote:
Hi Erick,
The nodes have 4 SSD (1TB for each but we only use 2.4TB of space. Current
disk usage is about 50%) with RAID0. Based on number of disks we increased
memtable_flush_writers: 4 instead of default of 2.
For the following we set: - max heap size - 31GB -
memtable_heap_space_in_mb (use default) - memtable_offheap_space_in_mb (use
default)
In the logs, we also noticed system.sstable_activity table has hundreds of MB
or GB of data and constantly flushing:
DEBUG [NativePoolCleaner] <timestamp> ColumnFamilyStore.java:932 - Enqueuing
flush of sstable_activity: 0.293KiB (0%) on-heap, 0.107KiB (0%) off-heap DEBUG
[NonPeriodicTasks:1] <timestamp> SSTable.java:105 - Deleting
sstable:/app/cassandra/data/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/md-103645-big
DEBUG [NativePoolCleaner] <timestamp> ColumnFamilyStore.java:1322 - Flushing
largest CFS(Keyspace='system', ColumnFamily='sstable_activity') to free up
room. Used total: 0.06/1.00, live: 0.00/0.00, flushing: 0.02/0.29, this:
0.00/0.00
Thanks, Jiayong Sun On Wednesday, August 11, 2021, 12:06:27 AM PDT,
Erick Ramirez <[email protected]> wrote:
4 flush writers isn't bad since the default is 2. It doesn't make a
difference if you have fast disks (like NVMe SSDs) because only 1 thread gets
used.
But if flushes are slow, the work gets distributed to 4 flush writers so you
end up with smaller flush sizes although it's difficult to tell how tiny the
SSTables would be without analysing the logs and overall performance of your
cluster.
Was there a specific reason you decided to bump it up to 4? I'm just trying
to get a sense of why you did it since it might provide some clues. Out of
curiosity, what do you have set for the following? - max heap size -
memtable_heap_space_in_mb - memtable_offheap_space_in_mb