Your compaction time won't improve immediately simply by adding nodes because the old data still needs to be cleaned up.
What's your end goal? Why is having a spike in pending compaction tasks following a massive write an issue? Are you seeing a dip in performance, violating an SLA, or do you just not like it? On Fri, Apr 27, 2018 at 10:54 AM Mikhail Tsaplin <tsmis...@gmail.com> wrote: > The cluster has 5 nodes of d2.xlarge AWS type (32GB RAM, Attached SSD > disks), Cassandra 3.0.9. > Increased compaction throughput from 16 to 200 - active compaction > remaining time decreased. > What will happen if another node will join the cluster? - will former > nodes move part of theirs SSTables to the new node unchanged and compaction > time will be reduced? > > > > $ nodetool cfstats -H dump_es > > > Keyspace: table_b > Read Count: 0 > Read Latency: NaN ms. > Write Count: 0 > Write Latency: NaN ms. > Pending Flushes: 0 > Table: table_b > SSTable count: 18155 > Space used (live): 1.2 TB > Space used (total): 1.2 TB > Space used by snapshots (total): 0 bytes > Off heap memory used (total): 3.62 GB > SSTable Compression Ratio: 0.20371982719658258 > Number of keys (estimate): 712032622 > Memtable cell count: 0 > Memtable data size: 0 bytes > Memtable off heap memory used: 0 bytes > Memtable switch count: 0 > Local read count: 0 > Local read latency: NaN ms > Local write count: 0 > Local write latency: NaN ms > Pending flushes: 0 > Bloom filter false positives: 0 > Bloom filter false ratio: 0.00000 > Bloom filter space used: 2.22 GB > Bloom filter off heap memory used: 2.56 GB > Index summary off heap memory used: 357.51 MB > Compression metadata off heap memory used: 724.97 MB > Compacted partition minimum bytes: 771 bytes > Compacted partition maximum bytes: 1.55 MB > Compacted partition mean bytes: 3.47 KB > Average live cells per slice (last five minutes): NaN > Maximum live cells per slice (last five minutes): 0 > Average tombstones per slice (last five minutes): NaN > Maximum tombstones per slice (last five minutes): 0 > > 2018-04-27 22:21 GMT+07:00 Nicolas Guyomar <nicolas.guyo...@gmail.com>: > >> Hi Mikhail, >> >> Could you please provide : >> - your cluster version/topology (number of nodes, cpu, ram available etc) >> - what kind of underlying storage you are using >> - cfstat using -H option cause I'm never sure I'm converting bytes=>GB >> >> You are storing 1Tb per node, so long running compaction is not really a >> surprise, you can play with concurrent compaction thread number, compaction >> throughput to begin with >> >> >> On 27 April 2018 at 16:59, Mikhail Tsaplin <tsmis...@gmail.com> wrote: >> >>> Hi, >>> I have a five nodes C* cluster suffering from a big number of pending >>> compaction tasks: 1) 571; 2) 91; 3) 367; 4) 22; 5) 232 >>> >>> Initially, it was holding one big table (table_a). With Spark, I read >>> that table, extended its data and stored in a second table_b. After this >>> copying/extending process the number of compaction tasks in the cluster has >>> grown up. From nodetool cfstats (see output at the bottom): table_a has 20 >>> SSTables and table_b has 18219. >>> >>> As I understand table_b has a big SSTables number because data was >>> transferred from one table to another within a short time and eventually >>> this tables will be compacted. But now I have to read whole data from this >>> table_b and send it to Elasticsearch. When Spark reads this table some >>> Cassandra nodes are dying because of OOM. >>> >>> I think that when compaction will be completed - the Spark reading job >>> will work fine. >>> >>> The question is how can I speed up compaction process, what if I will >>> add another two nodes to cluster - will compaction finish faster? Or data >>> will be copied to new nodes but compaction will continue on the original >>> set of SSTables? >>> >>> >>> *Nodetool cfstats output: >>> >>> Table: table_a >>> SSTable count: 20 >>> Space used (live): 1064889308052 >>> Space used (total): 1064889308052 >>> Space used by snapshots (total): 0 >>> Off heap memory used (total): 1118106937 >>> SSTable Compression Ratio: 0.12564594959566894 >>> Number of keys (estimate): 56238959 >>> Memtable cell count: 76824 >>> Memtable data size: 115531402 >>> Memtable off heap memory used: 0 >>> Memtable switch count: 17 >>> Local read count: 0 >>> Local read latency: NaN ms >>> Local write count: 77308 >>> Local write latency: 0.045 ms >>> Pending flushes: 0 >>> Bloom filter false positives: 0 >>> Bloom filter false ratio: 0.00000 >>> Bloom filter space used: 120230328 >>> Bloom filter off heap memory used: 120230168 >>> Index summary off heap memory used: 2837249 >>> Compression metadata off heap memory used: 995039520 >>> Compacted partition minimum bytes: 1110 >>> Compacted partition maximum bytes: 52066354 >>> Compacted partition mean bytes: 133152 >>> Average live cells per slice (last five minutes): NaN >>> Maximum live cells per slice (last five minutes): 0 >>> Average tombstones per slice (last five minutes): NaN >>> Maximum tombstones per slice (last five minutes): 0 >>> >>> >>> nodetool cfstats table_b >>> Keyspace: dump_es >>> Read Count: 0 >>> Read Latency: NaN ms. >>> Write Count: 0 >>> Write Latency: NaN ms. >>> Pending Flushes: 0 >>> Table: table_b >>> SSTable count: 18219 >>> Space used (live): 1316641151665 >>> Space used (total): 1316641151665 >>> Space used by snapshots (total): 0 >>> Off heap memory used (total): 3863604976 >>> <(386)%20360-4976> >>> SSTable Compression Ratio: 0.20387645535477916 >>> Number of keys (estimate): 712032622 >>> Memtable cell count: 0 >>> Memtable data size: 0 >>> Memtable off heap memory used: 0 >>> Memtable switch count: 0 >>> Local read count: 0 >>> Local read latency: NaN ms >>> Local write count: 0 >>> Local write latency: NaN ms >>> Pending flushes: 0 >>> Bloom filter false positives: 0 >>> Bloom filter false ratio: 0.00000 >>> Bloom filter space used: 2382971488 >>> Bloom filter off heap memory used: 2742320056 >>> Index summary off heap memory used: 371500752 >>> Compression metadata off heap memory used: 749784168 >>> Compacted partition minimum bytes: 771 >>> Compacted partition maximum bytes: 1629722 >>> Compacted partition mean bytes: 3555 >>> Average live cells per slice (last five minutes): 132.375 >>> Maximum live cells per slice (last five minutes): 149 >>> Average tombstones per slice (last five minutes): 1.0 >>> Maximum tombstones per slice (last five minutes): 1 >>> >>> >>> ------------------ >>> >>> >>> I logged CQL requests going from Spark and checked how one such request >>> is performing - it fetches 8075rows, 59mb data in 155s (see below check >>> output) >>> >>> $ date; echo 'SELECT "scan_id", "snapshot_id", "scan_doc", >>> "snapshot_doc" FROM "dump_es"."table_b" WHERE token("scan_id") > >>> 946122293981930504 AND token("scan_id") <= 946132293981 >>> 930504 ALLOW FILTERING;' | cqlsh --request-timeout=3600 | wc ; date >>> >>> >>> Fri Apr 27 13:32:55 UTC 2018 >>> 8076 61191 59009831 >>> Fri Apr 27 13:35:30 UTC 2018 >>> >>> >>> >> >