Hi Thomas, I don't think I have seen compaction ever being faster.
For me, tables with small values usually are around 5 MB/s with a single compaction. With larger blobs (few KB per blob) I have seen 16MB/s. Both with "nodetool setcompactionthroughput 0". I don't think its disk related either. I think parsing the data simply utilizes the CPU or perhaps the issue is GC related? But I have never dug into it, I just observed low IO-wait percentages in top. regards, Christian On Thu, Apr 26, 2018 at 7:39 PM, Jonathan Haddad <j...@jonhaddad.com> wrote: > I can't say for sure, because I haven't measured it, but I've seen a > combination of readahead + large chunk size with compression cause serious > issues with read amplification, although I'm not sure if or how it would > apply here. Likely depends on the size of your partitions and the > fragmentation of the sstables, although at only 5GB I'm really surprised to > hear 32GB read in, that seems a bit absurd. > > Definitely something to dig deeper into. > > On Thu, Apr 26, 2018 at 5:02 AM Steinmaurer, Thomas < > thomas.steinmau...@dynatrace.com> wrote: > >> Hello, >> >> >> >> yet another question/issue with repair. >> >> >> >> Cassandra 2.1.18, 3 nodes, RF=3, vnode=256, data volume ~ 5G per node >> only. A repair (nodetool repair -par) issued on a single node at this data >> volume takes around 36min with an AVG of ~ 15MByte/s disk throughput >> (read+write) for the entire time-frame, thus processing ~ 32GByte from a >> disk perspective so ~ 6 times of the real data volume reported by nodetool >> status. Does this make any sense? This is with 4 compaction threads and >> compaction throughput = 64. Similar results doing this test a few times, >> where most/all inconsistent data should be already sorted out by previous >> runs. >> >> >> >> I know there is e.g. reaper, but the above is a simple use case simply >> after a single failed node recovers beyond the 3h hinted handoff window. >> How should this finish in a timely manner for > 500G on a recovering node? >> >> >> >> I have to admit this is with NFS as storage. I know, NFS might not be the >> best idea, but with the above test at ~ 5GB data volume, we see an IOPS >> rate at ~ 700 at a disk latency of ~ 15ms, thus I wouldn’t treat it as that >> bad. This all is using/running Cassandra on-premise (at the customer, so >> not hosted by us), so while we can make recommendations storage-wise (of >> course preferring local disks), it may and will happen that NFS is being in >> use then. >> >> >> >> Why we are using -par in combination with NFS is a different story and >> related to this issue: https://issues.apache.org/ >> jira/browse/CASSANDRA-8743. Without switching from sequential to >> parallel repair, we basically kill Cassandra. >> >> >> >> Throughput-wise, I also don’t think it is related to NFS, cause we see >> similar repair throughput values with AWS EBS (gp2, SSD based) running >> regular repairs on small-sized CFs. >> >> >> >> Thanks for any input. >> >> Thomas >> The contents of this e-mail are intended for the named addressee only. It >> contains information that may be confidential. Unless you are the named >> addressee or an authorized designee, you may not copy or use it, or >> disclose it to anyone else. If you received it in error please notify us >> immediately and then destroy it. Dynatrace Austria GmbH (registration >> number FN 91482h) is a company registered in Linz whose registered office >> is at 4040 Linz, Austria, Freistädterstraße 313 >> >